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FOREWORD 

This report contains the results of the first 18 months 
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(OEG-I- 7 -O 7 IO 85 - 4286 ) from the Bureau of Research of the Office 
of Education, U.S. Department of Health, Education and Welfare. 

In addition, the University of California provided contributory 
support to the project. The principal investigator was M.E. Maron, 
associate director of the Institute of Library Research, and the 
project directors were Allan Humphrey and Joseph Meredith. 

The three co-authors are jointly responsible for the contents 
of this document; each of us was involved to some extent in all 
parts. However, section 6 on MTM is almost exclusively the work 
of Joseph Meredith, section 5 is mainly the work of Allan Humphrey 
and sections 1 through h were authored by M.E. Maron. 
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1. INTRODUCTION AND SUMMARY 



1 „ 1 The Mot i vat i on 

This Final Report (Phase I) presents the results of the first 
18 months of study, research, and development toward the design and 
implementation of an Information Processing Laboratory for education 
and research in l'ibrarianship . The purpose of this Laboratory, which 
we have conceived of primarily as an on-line information processing 
facility, is to provide a new, active, and powerful vehicle for 
teaching and for research in the field of librarianship. The impe- 
tus behind this work is a realization of the enormous impact that 
information processing technology (e.g., the digital computer, dig- 
ital communication systems, video display terminals) will have on 
future library systems and on the profession of librarianship. 

The technology of the digital computer (and the associated con- 
ceptual techniques that are presupposed by its use) has a dual sig- 
nificance in this field. On the one hand, it provides the means to 
store, interrogate, analyze, and retrieve library data, and hence 
it will be used for automating library services. On the other hand, 
it provides an ideal vehicle to teach advanced library students 
about new principles of library science, and hence it can be used 
for education. The technology and the associated conceptual prin- 
ciples needed in order to use this technology are changing this 
field. And one of the consequences of this is the need for a re- 
structuring of the education for future library scientists. This, 
then, is the forcing motivation behind this research. 

1.2 The Setting for This Study 

This Laboratory project has been conducted in an intellectual 
setting that we believe provides a number of factors essential for 
success . 

The work has been done within the Institute of Library Research 
of the University of California at Berkeley and in intimate contact 
with the School of Librarianship - a professional school that offers 
the M.L.S. and the Ph.D. degrees in librarianship. The contact 
with the School of Librarianship has provided the realism, the 
awareness of current educational problems and policies, the stimu- 
lation and critical advice on the part of faculty and students, 
that have helped to guide this project. 

The Laboratory is not an abstract idea and it is not intended 
as a facility to teach some ideal student. The Laboratory is in- 
tended to be part of an on-going graduate school of librarianship. 
The Laboratory has to "fit” and eventually its effectiveness must 
be tested in a real operational library school setting. 
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Diversity of talent is another important benefit that is ob- 
tained by developing this Laboratory in a library school environ- 
ment within a great university. We have been able to engage the 
talent of advanced students in disciplines ranging from electrical 
engineering and computer science, to philosophy, statistics, busi- 
ness administration and, of course, librarianship. These bright, 
active students have been contributing to the development of this 
Laboratory and, at the same time they have been learning about pro- 
blems and new methods in the field of librarianship. 

1.3 Summary of Laboratory Status 

The work during this phase has been concerned with the planning 
of the Laboratory and its development in accordance with this plan. 

The major planning task has been that of evolving the concept of the 
Laboratory in relation to the education needs of the field, while 
the development tasks have been to provide the computer programs , 
data files, equipment, space, etc., which together constitute the 
Laboratory. 

To determine the educational needs of the field, we had to assess 
the state of the field, its past, its present, and its future. 
Librarianship has been primarily an applied science concerned with 
immediate operational problems , and its practices have too often 
been based on precedent without any explicit logical rationale. 

Today librarianship is in a state of transition, transition being 
caused by a number of factors , but mainly due to the increasing de- 
mand for information and to the enormous information processing 
capability of digital computer technology. These factors are caus- 
ing those concerned with librarianship to look at the field more 
closely in order to identify key conceptual problems, to discover 
the rationale for its present practices and to start to develop, if 
possible, a coherent theory of librarianship. 

In short, the field is moving from a pre-scientific to more of 
a scientific discipline. At the same time, the practice of librar- 
ianship is changing rapidly as digital computers are incorporated 
in library operations. This is creating a demand for librarians 
who grasp the nature and techniques of systems analysis and automa- 
tic information processing. Education in librarianship, then, must 
provide for the dual nature of the field - the applied and the the- 
oretical - and we plan for the Information Processing Laboratory to 
fill both needs with emphasis on the theoretical. 

The planning has resulted thus far in definition of initial 
topics within librarianship to be supported by the Laboratory and 
this, in turn, has led to the development of computer programs for 
on-line interrogation and search, and to data files upon which to 
’’exercise” these techniques. Thus far, these initial ’’pieces” of 
the Laboratory have been assembled and have been checked out. 



However, no aspect of the Laboratory is as yet operational in the 
sense that students are using it on a regular basis. 



The Laboratory capabilities relate both to intellectual access 
(e.g., associative searching, automatic indexing, automatic abstract 
ing) and to more traditional course content (e.g., subject catalog- 
ing. 



In both teaching and research, there will be multiple modes of 
providing these laboratory capabilities (e.g., on-and-off— line access 
through mechanical and CRT terminals, printouts, books, micro-produc- 
tion), but the main one will be on-line access over remote CRT (cath- 
ode ray tube) terminals to a computer system. We did not have CRT’s 
during this period, however, and the initial on-line mode we have 
established uses remote mechanical terminals . 



We have developed an initial prototype capability in the two 
topics (associative search and subject cataloging). To support the 
teaching of associative search, we have created and programmed three 
retrieval routines, any one of which can be used with any one of 
three known statistical measures of term association. These routines 
are run from on-line terminals on an experimental corpus of approxi- 
mately 300 deeply indexed documents in the field of library and 
information sciences. The major contribution of the work in asso- r 
ciative search will be in the ability to make comparative studies 
of query and retrieval effectiveness. Further, we believe that the 
most complex of these retrieval routines will itself be a contri- 
bution to retrieval technique once it is fully debugged and documented. 

To support the teaching of subject cataloging, we have written 
the prototype version of a subject cataloging course to be taught 
through the CRT terminals. This course is programmed in PILOT, a 
high-level terminal interaction language written in PL-1, and it 
operates from the Institute terminals on the computer facility of 
the University of California Medical Center, San Francisco. (Acous- 
tic couplers are used to connect two of the Institute’s mechanical 
terminals via phone lines with the San Francisco computer,, an IBM 
360/50, 256 K memory.) The subject cataloging course will* be re- 
viewed and revised before it is used on an experimental basis. 

We expect that the most important contribution of the subject 
cataloging work will be to the methods of using computer systems 
to support education in librarianship . That is, before we can make 
the course available to students, we have had to extensively analyze 
and define the course content, the sequence of presentation and in- 
deed the very nature of teaching and computer presentation of topics. 
In addition, our work in methodology has led to an extension in the 
methods of programming such courses. As with the associative 
search routines mentioned earlier, these methods are not yet docu- 
mented for general presentation. 
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To support the teaching and research in both areas, we have 
two remote mechanical terminals connected via data sets and tele- 
phone lines to an IBM 360 Model 40 in the Berkeley campus computer 
center. This machine has a 128 K core memory and 2314 disc storage, 
plus the normal peripheral equipment . The terminals interact with 
the computer under the control of a monitor system developed by the 
Institute.* See Appendix 3 for a more complete discussion. 

1.4 Future Directions and Plans 

As we move into Phase II on this project, we expect to shift 
the emphasis from planning and preparation to student involvement 
in the Laboratory. Of course overall systems planning will continue 
to some extent and will be directed more toward the student work 
within the Laboratory and toward determining the educational and 
research effectiveness of the Laboratory. 

We will continue to widen the scope and range of the Laboratory 
facilities. Specifically, we will be extending and expanding the 
assortment of programs for the on-line study of methods of intellec- 
tual access. First of all, our current routines for handling assoc- 
iative retrieval will be extended and refined. Actual Laboratory 
exercises designed to illuminate certain features of associative 
retrieval will be worked out. These exercises and guidelines for 
the on-line learning of associative retrieval will be coupled with 
an advanced course in the School of Librarianship thus making the 
Laboratory a full and integral part of the curriculum. 

Programs will be developed that will enable the computer to 
assist in making comparative evaluations of alternative searches. 
Thus, not only will the student be presented with the different out- 
puts ("relevant" documents) in response to different search specifi- 
cations, but the system will assist in providing certain quantita- 
tive measures of the differences (e.g., in terms of size and ranking 
of retrieval items). Also, we will start to select, evaluate, and 
develop measures of retrieval effectiveness to assist the student in 
determining the goodness of competing search modes. 

In addition to the current emphasis on associative retrieval we 
plan to widen the scope of topics to include routines for: l) teach- 

ing formal principles of subject analysis and identification, 

2) on-line citation coupling, and 3) context searching. This neces- 
sitates parallel work on a number of fronts. Not only must we de- 
sign special routines for presenting, "exercising" and teaching 
these methods of intellectual access, but we will need special rou- 
tines for evaluating their behavior. 



*The monitor was jointly developed by the Laboratory project and 
the Institute's File Organization project (0EG-l-7-071083-5068) . 
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We will expand our current experimental corpus of approximately 
300 documents, both in terms of size and also in terms of the amount 
of data that is related to each corpus document. We will be using, 
for example, not only the standard bibliographic record for each 
item, but also the citations, abstract, and other contextual data 
for the corpus items. We will seek out, obtain, and integrate other 
and different kinds of experimental corpora for comparative studies 
by the students . 

In the work on computer-assisted instruction, we will continue 
to refine the current computer course on subject cataloging. We 
plan to test and evaluate the effectiveness of this course by having 
a relatively large number of students use it. Also we will start to 
apply these techniques for an on-line teaching of some aspects of the 
topic entitled "reference.” 

1.5 Emphasis of This Report 

Our purpose in this project is not merely to design and imple- 
ment an Information Processing Laboratory, but to uncover and clar- 
ify some of the key educational issues in contemporary librarianship 
so as to be able to guide and Justify our particular choices and 
decisions . 

We did not start this project with clear and distinct ideas as 
to exactly what ought to be done. We started with a strong intui- 
tive sense of the problem and we groped, stumbled, sometimes re- 
treated briefly, but eventually made real forward progress. We now 
feel that we have a firm grasp of the issues and that we can Justify 
the "design principles". Thus, we hope that the results of our in- 
quiry will be of value not only to those students who will be using 
the Laboratory here at Berkeley, but also to those educators who may 
be thinking of implementing a similar kind of educational facility 
elsewhere . 

In addition to the detailed presentation of our current status 11 , 
the emphasis in this report is on clarifying some of the central is- 
sues that must be faced today concerning the impact of digital tech- 
nology on the future of librarianship and bn the education of future 
librarians . 

1.6 The Organization of This Report 

This Final Report is divided into six sections of which this 
Introduction is the first. A discussion of the field of librarian- 
ship and its probable future direction is contained in the section 2, 
entitled "Current Trends in Librarianship". From this view of "what 



#For those who would like to have a brief chronology of our activ- 
ities, see Appendix I. 
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is happening" flow certain implications for education, and these are 
discussed in section 3, "The Implications for Education". This sets 
the stage for a general presentation of the nature and scope of an 
Information Processing Laboratory, which is contained in section 4. 

The final two sections report in detail on our laboratory devel- 
opment effort. The on-line teaching of associative retrieval is 
contained in section 5. The on-line teaching of traditional subject 
cataloging is contained in section 6. 
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2. CURRENT TRENDS IN LIBRARIANSHIP 



2.1 Initial Remarks 

What kind of education is needed to prepare the new leaders 
in the field of librarianship?* The answer to this key question 
can come only af ter a closer look at what is happening in librar— 
ianship today and what the potential implications are for the 
future. Thus, in this part of the report we will look at some of 
the forces behind the current changes in librarianship. Specif- 
ically we will consider the digital computer and its great impact, 
present an d potential, on both practical and theoretical librar- 
ianship. 

Librarianship today is a profession in a state of transition. 
This changing nature of librarianship has implications not only 
for the fut ur e of libraries and library research, but also for the 
educational requirements for future librarians. 

One way to characterize the change is to describe it in terms 
of a transition from a pre-scientific to a scientific discipline. 
Traditionally, librarianship has been a strictly practical pro- 
fession - one that has been looked at (and that has seen itself) 
nrimarily as service oriented. In an exaggerated sense, we could 
say that the profession never fully looked inward at its own sub- 
ject content in order to formulate and empirically justify some 
of its general principles. In the pre-scientific state a. pro- 
fession justifies its procedures and practices primarily in terms 
of rules of thumb , and in terms of its history , its traditions . 

And this has been the case with librarianship. 

As a profession moves to a more scientific state, it begins 
to identify and explicate fundamental concepts and it begins to 
formulate principles that can guide and logically justify its 
practices. Contemporary librarianship is becoming more analytic, 
self-critical, more scientific and research oriented. It is be- 
ginning to see its own subject content more fully and clearly, and 
it is beginning to apply keener tools of analysis to its own pro- 
blems . 

Thus the subject content of contemporary librarianship, which 
can be characterized as the general problem of information identi— 



*Here as elsewhere throughout this report our concern in librar- 
ianship is on the so-called "information science" aspects and not 
on such specialties as history of the book, history of printing, 
history of libraries, international and comparative librarianship. 
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fication, storage and transfer, is beginning to be unfolded and I 

treated in more rigorous ways using new conceptual tools and tech- * 

niques of logic, mathematics, statistics and systems analysis. 

Librarianship is a profession with both empirical and theoretical T 

content. In its most fundamental aspects, librarianship involves ,f 

the following kinds of problems: the nature of knowledge and the 

notion of an information need; the nature of information and how 

it can be represented, identified, and communicated; the structure 

of language and how it might be analyzed formally; the meaning of 

"content" and "about"; and, of course, the meanings and measures ^ 

of "relevant" . j 

There are numerous important implications and consequences 
that flow from our account of the scope of librarianship. These 
implications relate to the design and organization of libraries 
of the future, they relate to the kinds of research and develop- 
ment that are mo3t relevant to this changing field, and most impor- 
tant for our purposes, they relate to the education and training 
that is required to prepare a new generation of library scientists u 

who can participate creatively in this rapidly moving field. 



2.2 Increasing Demand for Information 

The pressure of an exponentially growing population of books, 
journals , technical reports, etc., certainly can be cited as one of 
the forces behind much of the current activity in librarianship. 
And, as the birth rate of books (and other forms of documentary 
information) booms upward, the problems of storage and access are^ 
multiplied at an even greater rate. But this accelerating input 
is only one dimension of the problem; another is the growing pop- 
ulation of people who need access to that literature. This greater 
and more diversified demand for information results in part from 
a new awareness of the nature of information. Increasingly we 
are tending to vi ew information as a commodity to which one can 
assign a value. People want information for pleasure and ^under- 
standing, for prediction and control; we use information m pur- 
poseful ways - whatever our purposes . 

Thus, in principle, the value of information can be related 
to its effectiveness in helping its recipient to achieve his 
purposes. ’Whether it be small organizations, large industrial 
firms, or agencies of our government, there is a growing incli- 
nation to view information as an essential resource that must be 
protected, enriched, and intelligently exploited. Thus, there is 
pressure from many sectors of our society to develop ideas, tech- 
niques, and systems to properly handle these valuable resources of 
published information. Increasingly, there is a sense of urgency 
to get on with the important job of designing improved library 
systems to acquire, identify, store, retrieve, and disseminate 
information to a growing, diverse class of users. 
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2.3 The Computer 

The major force that is both accommodating and effecting 
rapid transition in the field of librarianship is the digital 
computer and its related technology . There has always been a 
mutual interaction and influence between science and technology. 

As science develops new physical principles, the principles become 
embodied in a variety of new technological devices, some of which, 
in turn, (such as measuring instruments) contribute to the contin- 
ual development of science. 

■ 

Digital computer technology is just over two decades old, but 
the growth and improvements in this field have been most dramatic. 

We now have really high speed and reliable central processing units; 
but even more important for library purposes we now have very high 
capacity magnetic memory systems which can store hundreds of thou- 
sands of bibliographic records in digital form. And, of course, 
there are high density, noneraseable digital photographic memory 
units that have a capacity of over a trillion bits . 

New developments in on-line, time-sharing systems, high cap- 
acity communication channels, and inexpensive and easy-to-use 
terminals allow one to communicate with a central system from re- 
mote locations. Because it is possible to formulate very complex 
rules as a long sequence of very elementary computer instructions, 
any kind of a task that can be described in complete and unambig- 
uous detail can be implemented as a program for a computer and 
automated. Thus the computer becomes a powerful tool for analysis, 
for simple data handling and processing, for teaching and for re- 
search. 

2.4 The Impact on Applied Librarianship 

Applied librarianship denotes that large classes of problems 
concerning the practical operation of existing library systems. 

Its concern is toward the immediate, pressing, operational problems 
that face today* s librarians. 

Not unlike other organizations such as banks and insurance 
companies, there are many strictly clerical functions in libraries 
that are now being mechanized via the digital computer. It seems 
quite clear that this trend to automate most of the strictly 
clerical functions within libraries will continue. For a fuller 
discussion of these issues the reader should see the SDC Technical 
Report "Technology and Libraries"*. 



#C. Cuadra, et al, "Technology and Libraries," TM-3732 . Systems 
Development Corporation, Santa Monica, California. November 15 » 19^7. 
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•"Information Networks," ^ual Revi^of Tnfo mation Scien ce 
„nd Technology , Vol. HI (1968), Chap. 10, PP- 289-327. 
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they concern automated information systems and their evaluation. 
Theoretical librarianship also encompasses all the related con- 
ceptual tools and techniques of computer science, cybernetics, 
and information theory most broadly conceived. 

The emphasis in theoretical librarianship is on finding 
organizational principles that can be formalized so that most of 
the operations required for information storage, analysis, identi- 
fication, and transfer can be automated as an on-line system of 
information interrogation, search and retrieval. The goal, of 
course, (possibly never to be fully achieved) is to have a com- 
puterized libraiy system with the full text of the documents in 
machine form and so organized that a patron can express his re- 
quest for information in ordinary English. The automated system, 
in turn, would interrogate the user further as needed, and then 
search, identify, and retrieve all and only the desired information. 

The problems involved in the design of library information 

systems both for literature searching and for factual question 

answering are enormously complex intellectual problems. In 

order to solve these problems, one must come to grip with the 
following questions which belong within the domain of theoretical 
librarianship : 

What is the meaning of "information need", 

"relevance", "about”, "subject content"? 

What is the meaning of "similar in meaning" 

and "similar in content", and how can the 
above relationships be measured? 

By what set of rules and procedures can 

documents be related (clustered, grouped) 
according to measures of similarity? 

What are best measures for the retrieval 

effectiveness of a library system? 

By what set of rules and procedures can 

documents be identified for retrieval 
purposes? 

How can problems of encoding, storage, 

and access be related so that a library 
file is optimally organized to provide 
the best response time in the most 
economical way? 

The problem of how requests and documents can be analyzed in 
order to retrieve all and only the relevant items (and possibly 
have them ranked by some measure of their degree of relevance) is 
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called the problem of "intellectual access . During the pa 
decade, a growing number of research projects have Erected their 
efforts toward the study of new and improved computer techniques 
for obtaining deep intellectual access to stored documents. And, 
as the so-called "library problem" has intensified and as compu er 
me mor ies have been increased in capacity with lowering costs, 
experiments on the problem of intellectual access have intensifie . 
(For a review of the scope and variety of current activity in 
this problem area one should consult chapters 4, 6, and 9 of the 
Annual R eview of Information Science and Technology , I960, Vol. 

III?) 

For example , one of the features of contemporary work in 
librarianship is the use of quantitative concepts and. not classi- 
ficatory concepts alone. Thus, instead of either assigning 83:1 
index tag or not, one can assign a tag with a weight which woul 
represent the degree to which the indexing holds for the document 
in question. One interpretation for weighted indexing is 
following: to say that index term Ij holds for document Dj. with 
the weight W-h (where , is to say that there is a, 

probability Iwhose value is estimated as Wij) that if a patron 
were to be satisfied with the information contained in Dj., he 
would be requesting information using the index tag Ij. Given a 
request for information with a weighted indexing scheme under the 
above interpretation, the library system must, of course, perform 
a fair amount of computing in order to determine which document is 
most probably relevant, next most probably relevant, etc.. Thu., 
in general, in order to deal with quantitative concepts and qjm 
titative measures of match, we must use a machine because it would 
be impossible, for all practical purposes, for a human to do the 
kind of computing necessary in quantitative searches. 

However, the problems of intellectual access cannot be solved 
by technology alone. No amount of computer memory can solve the 
problems because they are, fundamentally, intellectual problems. 
These concept ual problems that block the road to full library 
automation are very deep and very complex. We are only jus 
beginning to see in a clear way what some of the dimensions of 

these problems are. 

2.6 The Future of Libraries and Librarianship 

Libraries of the future will be very different kinds of 
information systems than those in the past. The computing machine, 
digital communications systems, video display terminals, offer th 
technology needed to construct highly automated information systems 
in which large amounts of bibliographic data and full text are 
stored in digital memories and where a variety of special data 
banks are interconnected to form centralized repositories accessible 
to a large number of remote users. The technology is here, but 
there are complex conceptual problems that must be solved before 
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the technology can he properly exploited in the service of librar— 
ianship. Clearly the rate of progress toward the development of 
such systems hinges greatly on those young people who are now 
preparing for a career in theoretical librarianship - those who 
hopefully will be solving the conceptual problems mentioned above. 
Also, of course, the very image of on-line fully automated library 
systems will influence the direction of the education and research 
of current doctoral students. The profession and our libraries 
are changing, under the impact of many forces, one of the most 
important of which is the computer. Clearly, librarianship is in 
a state of transition. 

And, finally, one of .the things that emerges from an examina- 
tion of the impact of the computer (and the related concepts from 
the information sciences) is a change in the very conception of 
librarianship. Future librarians will talk less about books, and 
bookshelves , LC numbers , and shelf lists , call numbers and cir- 
culation desks , and instead will cast their analyses in the . lan- 
guage of information identification and information processing, 
control, communication, and evaluation of information systems. 

But we mean more than this. In the past when one thought of a 
library, one thought of the "Collection” - i.e., the contents 
(books, manuscripts) of the library. Clearly the documents (or 
better yet the information they contain) are necessary ingredients 
of any and all library systems. But, an equally critical element, 
above and beyond the content of a library are the processes and 
procedures for identifying and retrieving; i.e., the rules for 
bibliographic organization and access. A library, in a, very 
fundamental sense, must be conceived of as the collection plus, 
the processes needed for proper access . 

And, as librarianship moves toward a more scientific state, 
library scientists will attempt to formulate optimal rules and pro- 
cedures for on-line interrogating, identifying, relating, searching, 
selecting, and disseminating information. Library search tactics 
and strategies for on-line interrogation and search will have been 
formulated and refined. This suggests the active view of a library 
a, process — not a static view of a library as a collection (plus 
the catalog scheme). The inner structure of the subject of librar- 
ianship is the study of the processes for identification and access. 
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THE IMPLICATIONS FOR EDUCATION 



3.1 Initial Remarks 

Since the digital computer is here to stay, a central question 
now is how and where the library profession should move to modify 
and update the education of its students. Where and how should the 
curriculum be enriched? What kinds of new courses and educational 
facilities are required? 

3.2 Aims of Education in Librarianship 

Broadly speaking the aims of education (excluding job training) 
are the same regardless of the specific field. Education is prepar- 
ation for the future: teaching a person how to grow intellectually; 

and how to analyze and evaluate - regardless of the subject matter. 

As a student advances within a specific field (e.g., in graduate 
school) the content of his education becomes more specific and the 
emphasis becomes more directed, although the basic aims should remain 
the same. 



If education is intelligent preparation for the future, and if 
the future of librarianship includes , among other things , a larger 
role for computers and information processing, where and how must 
education for librarianship be modified to account for this? Meet- 
ing the aims of education in librarianship increasingly means teach- 
ing students to be able to analyze and evaluate the field in terms 
of the information sciences and of the application of computer tech- 
nology. With this kind of education, they will be able to read the 
growing literature in the field and intelligently apply information 
science techniques. This applies to doctoral students who will even- 
tually become researchers and educators as well as to master’s stu- 
dents who will soon become practitioners in the field. 

The M.L.S. program within many library schools is designed to 
train people with an emphasis in applied librarianship, while at the 
Ph.D . level the educational emphasis shifts to problems of research 
and to emp hasis on an understanding of the basis underlying logical 
structure of the discipline. Let us look more closely at the impact 
of the computer for education in applied librarianship and for educa- 
tion in theoretical librarianship. 

3.3 Implications for Applied Librarianship 

In applied librarianship the emphasis is on the immediate,, 
pressing, operational problems that currently plague our libraries. 
What kind of education will be most relevant for the library prac- 
titioner in order to prepare him for the extended mechanization of 
the library in the future? What topics need to be mastered, and at 
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what level of depth and detail? And, very importantly, how can these 
subjects best be taught in library school? The amount of time that 
is available for formal education on the part of students is limited. 
Thus, it is not practical to demand or require all of the course 
work that it would be "nice” for the student to have. What are the 
relative priorities? 

Needless to say, we do not have any final, uncontested answers. 
Our purpose here is to indicate what we feel is important in the 
education for librarianship and why we have moved in the directions 
that we have in the design of the Information Processing Laboratory. 

It seems quite clear that library school students should have 
some reasonably adequate background in elementary logic, mathematics, 
and statistics. (And, incidentally what is adequate today will 
surely need to be strengthened in order to be adequate in the near 
future.) In addition to a knowledge of these formal tools of analy- 
sis (elementary logic, mathematics, and statistics), library students 
should have training in the techniques and methods of systems analy- 
sis and operations research. These new methods for analyzing, syn- 
thesizing, and evaluating complex information systems, such as li- 
braries, are becoming increasingly important for a career in applied 

librarianship . 



The next group of topics that these students should know are 
those that cluster around the computer, the principles of program- 
ming, and library applications of information technology. Surely, 
no college graduate today can claim to have a broad education if he 
does not understand the elements of computer organization, program- 
ming, and some kinds of applications. And, in the case of education 
for librarianship, it is absolutely essential that students know 
about the digital computer, principles of its logical organization, 
and principles of programming. Again, the computer (and its related 
technology) will be one of the essential ingredients of future li- 
brary systems and future library practitioners must have a strong 
grasp of this very relevant technology. Moving, once again, from 
the general to the more specific, it would seem that some background 
in library automation should be required. The student could be. 
taken in some detail through the logical steps leading from an ini- 
tial system analysis of some library subsystem (e.g., serials con- 
trol), through problems of encoding, file conversion and file organi- 
zation, to design and coding of the computer routines, to testing,. _ 
evaluation, etc. Thus, the student would learn about the entire 
process of admittedly a small piece of library automation. 

3.U Implications for Theoretical Librarianship 

In theoretical librarianship the emphasis is on long. range 
research problems - the search for fundamental clarification of key 
library science concepts and the search for principles of information 

-16- 



identification, organization, association, search and retrieval. 

What does the impact of the computer imply for education for theo- 
retical librarianship? 

In the specific case of those Ph.D students who wish to spec- 
ialize in the information sciences (or theoretical) aspects of li*- 
brarianship, what are the special needs and educational requirements? 
In addition to a strong background in logic and mathematics (the for- 
mal tools for analysis), they need a strong background in the infor- 
mation sciences, e.g.‘, information theory, computer organization and 
programing , properties of on-line operating systems, and principles 
of file organization. And although the primary emphasis of the Ph.D 
students will be toward theoretical librarianship, they must have a 
good grasp on the problems and solutions of applied librarianship. 
Hence, there is the educational requirement for the study of systems 
analysis and its application to the mechanization of clerical func- 
tions in libraries. However, a major part of the problems of theo- 
retical librarianship are the problems of intellectual access. , i.e., 
the study of formal methods for analyzing and retrieving stored in- 
formation in response to a request for information. 



3.4.1 The Meaning of Intellectual Access 

The problems, techniques, and methods of intellectual access 
are crucial in the study of theoretical librarianship. The computer 
and its potential in libraries of the future lays a demand on the 
profession that the important subject of intellectual access be 
taught and taught as effectively as possible. 

The problem of intellectual access denotes the problem of how 
to analyze, identify, search, relate, and retrieve documents that 
are relevant to a library patron's information need, relative to the 
request that he submits. Formal techniques for intellectual access 
are those techniques related to the analysis and processing of lin- 
guistic expressions that can be described solely in terms of the 
location and form of the expressions and that make no reference to 
their meanings. Thus formal techniques are those that are translat- 
able, in principle, into a computer program; i.e., they can be de- 
scribed completely, unambiguously, and without reference to their 
s e man t i c c ont ent . 

Automatic indexing is an example of a te chnique belonging bo 
the topic of intellectual access. Automatic indexing is the method 
of deciding in a formal way (i.e., based on the occurrence, frequency 
and grammatical form of the words [and other clues] m a document) 
the subject heading (s ) under which a document should be indexed. 

Another class of techniques belonging to the domain of intellec- 
tual access is that concerned with statistical measures of associa- 
tion, closeness and distance as applied to literature searching and 
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retrieval. A wide variety of formal rules for deciding about close- 
ness between index tags and about distance between document repre- 
sentations have been described in the recent literature in the field 
of librarianship . These techniques are central to the problem of 
intellectual access. 

3.^.2 Research Requirements 

A critically important part of the education of the Ph.D stu- 
dent, centers around research; i.e., preparing the student to con- 
duct independent , original research by having him produce a doctoral 
dissertation. An experimental corpus - at least partially in ma- 
chine form - will be required for those students who choose to do 
empirical research involving, perhaps, the design, testing, and 
evaluation of some new technique for obtaining access to stored lit- 
erature (as for example, use of a new technique for automatic index- 
ing, weighted indexing, associative searching, etc.). In addition 
to an experimental corpus in machine form, such a student will need 
certain computer facilities and appropriate software designed to 
handle his class of library data processing problem. 

One of the major educational needs in research librarianship, 
then, is a new kind of research facility giving students easy access 
to the information processing tools they need to conduct empirical 
research in this field. This facility would in a sense be a counter- 
part to the empirical research facility that, say, a linear acceler- 
ator provides for an advanced physics student. 
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4. THE NATURE AND SCOPE OF AN INFORMATION PROCESSING LABORATORY 

4.1 Initial Remarks 

We conceive of an Information Processing Laboratory as a new 
kind of educational and research facility designed specifically 
to extend, enhance, and provide an innovative mechanism for edu- 
cation of future library scientists . We interpret this Laboratory 
as a set of remote terminals connected to a central high speed, 
general purpose, digital computer. The system is designed so that 
students may sit individually at a terminal and proceed to call 
up a variety of different kinds of library related procedures. 

The student can "exercise" the procedures and evaluate the conse- 
quences, introduce modifications, make comparative studies and, 
thus, gain a new kind of insight into the problems and techniques 
by actually controlling and observing their behavior. In addition 
the Laboratory will provide the means to teach some topics utiliz- 
ing the computer in the instruction. And the Labortory will pro- 
vide the facilities, programs, experimental data bases, etc., 
needed by advanced students for empirical research. 

4.2 Possibilities and Priorities 

In an Information Processing Laboratory, one might want to 
teach: principles of file organization; on-line use of citation 

indexing; study of circulation flow via simulation models; computer 
search techniques; library data processing programming; techniques 
of query formation; measures of retrieval effectiveness; and so on. 

Needless to say, it would be impossible to do everything in 
such a Laboratory because the resources are fixed. And even if 
the resources were unlimited, one would have to proceed in a serial 
fashion - a step at a time. Thus we face the question of what is 
the best first step. What should we start with and why? 

The fulfillment of the rosy predictions about on-line, fully 
automatic text searching systems hinges on successful research on 
the problems of intellectual access. And, this is the area that 
we see needs to be most emphasized for the Ph.D. students because 
educating the Ph.D. student in librarianship is synonymous with 
educating the future leaders of the profession and the teachers. 

It is for primarily this reason that we have laid great emphasis 
on teaching the principles and methods intellectual access in the 
Laboratory. (See Section 4.4) 

4.3 Library Systems and the Instructional Role of the Computer 

One of the notions that emerges in thinking about future 
library systems concerns the problem of teaching a patron how to 
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use such a system. Assume that, in some not too far distant future 
we have a large scale, on-line interrogation, search, and retrieval 
system. In order to justify the heavy costs in the development and 
operation of such a system, its retrieval effectiveness will have 
to be reasonably good. But we know that no simple information 
processing of a patron’s request and the document file will be 
adequate because the problems of analyzing requests and documents 
are exceedingly complex. 

In order to get at relevant documents in response to a re- 
quest, the library system and the patron will have to engage in an 
iiterative process of communication. Given the request, the system 
makes a first "move" (i.e., an initial identification and access) 
and then provides the patron with some feedback (perhaps it dis- 
plays a sample of the kinds of documents it has selected after 
this first access). The patron must now decide which of the next 
possible group of alternative search ’’moves" would be the best. 

He must participate in the decision of how the search should pro- 
ceed. This two-way interaction continues to direct and modify the 
search until the patron feels satisfied (or else too frustrated 
to continue). 

The point of this is to say that before we ever arrive at the 
fully automated on-line library search system (if we ever do), we 
will have developed rather effective semi -automated systems in 
which the system and the patron interact in order to jointly guide 
the search. Now in order for the patron to work effectively with 
such a complex system, he must understand how it operates - what 
the search tactics and methods of intellectual access are. The 
ordinary patron cannot be expected to know the full complement of 
possible search "moves" or what exactly is implied by the use of 
them. Therefore, periodically he may have to interrupt the search 
and call upon the system to "explain" the meaning and use of cer- 
tain alternative search options . Thus, we see in library systems 
of the future the need of rules , procedures , and methods whereby 
the system can actively instruct the patron about how to guide a 
search . It is for this basic reason that we have emphasized our 
work on computer instruction (See Section 4.6). 



4.4 Intellectual Access 

4.4.1 Statistical Techniques 

Of all the logical tools and techniques for intellectual 
access that have so far been described in the professional lit- 
erature of librarianship, which should be selected for study? 
Again, we cannot implement all of them nor would we want to. Some 
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techniques are more important and potentially more valuable than 
others. Some techniques represent only minor variations of others. 
Some may be logically prior to the study and use of others. Thus 
a first problem that faced us in the design of this Laboratory was* 
the evaluation of the class of tools for intellectual access and 
the selection of one, or a few, to be developed initially. 

One might want tc teach: 

1) Formal methods for thesaurus and dictionary 

construction 

2) Techniques for automatic indexing, automatic 

extracting, or automatic classification 

3) Techniques of relevance feedback for request 

modification 

4) Techniques for computing degrees of relevance 

depending on variety of matching procedures 

5) Techniques for measuring the retrieval effec- 

tiveness of a literature searching system 

6) Techniques of clustering, clumping, grouping 

and their use in library situations 

7) Statistical techniques for computing degrees 

or correlation and closeness between two 

properties 

These measures of statistical closeness have a variety of appli- 
cations in the general problem area of automatic literature search 

and retrieval. 

As a field of study matures there is a constant attempt to 
replace classif icatory (two-valued) concepts with comparative con- 
cepts . In the case of literature searching systems this means, 
among other things, the attempt to compute degrees of relevance, 
to compute degrees of closeness between index terms, and to compute 
degrees of closeness between documents - i.e., statistical measures. 
In order to implement such systems two requirements must be sat- 
isfied: First, of course, comparative concepts must be developed 

so that one can at least say what it means (and how to compute) 
degrees of relevance, closeness, etc. And second, there must be 
some kind of mechanical device to do the computing since it would 
be unthinkable for a human to do the necessary computing in order 
to deal with requests for information. Because the computer is 
exactly the device that will compute any function at high speeds 
and because of the wide range of applications and the potential 
power of statistical measures, we decided to start with computer 
routines for teaching the meaning and use of these techniques. 
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4.4.2 Associative Retrieval 

One of the potentially most valuable uses of a statistical 
measure of closeness is for the process of associati ve retrig vg l . 
Associative retrieval can take a number of different forms, but 
basically it is a formal method for retrieving documents whose, 
index tags axe only close to a given specification. We emphasize 
the notion of close to distinguish associative retrieval from a 
conventional retrieval system that requires a direct match between 
a document index tag and a search specification. Thus, for example, 
if a user makes a request for all documents on the subject, say lj, 
then all and only those documents actually indexed under Ij would 
be selected for retrieval. However, there might be other documents 
not indexed under I* but under a different term I k (one which is 
close , in meaning, to Ij), and some (or all) of these documents 
might be relevant, relative to the user's information need. If 
the library system could compute measures of closeness between ^ 
all pairs of index terms, then it might automatically elaborate 
on that initial request and thus retrieve relevant documents which 
would not have been obtained by the initial request. Furthermore, 
the system could assign weights to those additional documents, 
retrieved and could rank the final output of documents according 
to the numerical value of these weights . 

All sorts of complex and important questions can be raised 
about how best to perform associative retrieval. By providing . 
the capability to do associative retrieval, students can investigate 
such questions as the following: Under what conditions does any 

one such measure produce better results than any other measure? 

How can one decide which measures are best for certain kinds of 
requests and why they perform best? Given a retrieval system that 
allows associative retrieval, how should this capability influence 
the way a user should formulate a request? Associative retrieval 
always broadens a search - it does not narrow. And it can sequen- 
tially broaden a search in different ways. What are the implica- 
tions of this broadening influence? At what stages in the search 
should elaboration take place - and then which of the different 
directions of elaborating should best be pursued? We are building 
an Information Processing Laboratory so that advanced students in 
librarianship may learn about these kinds of questions and begin 
to get good answers for them. 

In section 5 we discuss our associative retrieval work in detail. 

4.5 Intellectual Access; Complexity and Understanding 

Here we want to emphasize why formal principles of intellectual 
access should be taught in an on-line Laboratory. Formal techniques 
for intellectual access can be thought of as tools for enabling one 
to identify and retrieve documents relevant to a request. They 
can be interpreted as tools for performing logical work, on stored 
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data. These are complex tools and a complex intellectual process 
is required on the part of students in order to fully grasp and 
understand the nature of these tools. We believe that the stan- 
dard, traditional way of teaching students about this class of 
logical tools is inadequate and that tools of intellectual access 
can best be taught, learned, and comprehended via an on-line 
Laboratory. 

The understanding of logical tools for intellectual access 
comes only with an understanding of what happens when these tools 
are used. To understand is to understand actual as well as po- 
tential consequences of use. And this kind of understanding of 
"logical behavior" can come only with exposure to the use of such 
tools under a proper variety of representative circumstances . These 
tools must be exercised - put through their paces - by students, 
and the best vehicle for doing so is the digital computer. 

All of the logical tools of intellectual access can be pro- 
grammed as part of a system so that they can be called up and 
used on a variable set of documents . The results of such use can 
be studied by still other comparison techniques with the assis- 
tance of the computer. Thus we see the computer, in an on-line 
mode, as the ideal vehicle to study tools for intellectual access. 

4.6 Machine Tutorial Mode 

In planning this Laboratory, we decided to automate, to what- 
ever extent was reasonable, the -presentation and instruction in the 
use of the materials on the subject of intellectual access. How- 
ever, we could not implement a "course" on methods of intellectual 
access until we had developed the necessary course content, i.e., 
until we had the search languages, data bases, association files, 
evaluation routines, etc. But we realized that we could proceed 
in parallel t© develop tools for the on-line study of intellectual 
access and also computer-assisted-instruction techniques , if we 
selected for the content of the latter a topic in traditional 
librarianship . We selected the topic of subject cataloging. (See 
Part 6 for details concerning our decision to start with this par- 
ticular topic within traditional librarianship,) 

Our justification for the decision to present course material 
in the machine tutorial mode was the following: In addition to the 

possibility of a more efficient introduction to the subject, the 
student could get at the same time an introduction to on-line 
information processing, a hands-on experience in interacting with 
a system via a remote terminal. This experience would positively 
influence his thinking about on-line terminal processing in other 
facets of librarianship. We felt that this machine tutorial ap- 
proach could be especially valuable for those students who were 
entering the Ph.D. program without having first completed a stan- 
dard MLS degree. Also we felt that whether or not such a computer— 
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assisted course in subject cataloging were ever adopted, the pro- 
cess of analyzing, synthesizing, and restructuring existing course 
material for machine presentation could he beneficial. The results 
of this kind of analysis when fed hack could make the course better 
when presented again by the regular instructor. 
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5. ASSOCIATIVE RETRIEVAL 



5.1 Introduction and Summary 



5.1.1 Initial Remarks 

As we stated earlier in this report, one of the major objectives 
of the laboratory is to offer facilities for teaching, demonstrating, 
and experimenting with techniques of intellectual access. There are 
many of these techniques that lend themselves to demonstration in 
the laboratory ; we chose associative retrieval as the first techni- 
que for which we would develop laboratory tools . 



5.1.2 Types of Word Association 

Associative retrieval methods are based on the relationships 
which exist between terms used as content identifiers for a collec- 
tion of doc um ents - These relationships may be semantic, syntactic, 
or statistical. Semantic associations of terms depend upon the 
meaning of terms as governed by common usage - the meaning which is 
found in the dictionary. Thus "sofa" and "couch" are semantically 
associated because of their equivalence of meaning. Semantic word 
associations have a universal validity in that they rely upon the 
relationships of meaning which exist in language itself. 



Syntactic associations between terms take into account the con- 
text in which terms appear - usually a single sentence or phrase. 
These associations depend on the positions of terms in a string of 
text as determined by the rules of grammar. Thus, in the grammati- 
cal expression "freedom of speech," "freedom" and "speech'' are syn- 
tactically associated. Syntactic associations are not universally 
valid, but depend on the structure of the sentence under examination. 



Statistical associations treat terms as discrete, isolated enti- 
ties having no semantic or syntactic connection with other oerms.^ 
This kind of association between terms is dependent upon the statis- 
tics of term usage within a given document collection. Terms which 
are statistically highly correlated with one another in actual usage 
are considered to be associated. For example, in a computer sciences 
collection, "data" and "processing" may be highly associated statis- 
tically since they are found together frequently in the literature 
of computing. Statistical word associations do not have a universal 
validity; they depend entirely on the way terms are used in a partic- 
ular collection, and may be applicable only to the collection from 

which they are drawn. 



In summary, semantic word associations reflect the absolute 
meaning of words within the context of language as a whole « fc>yntac« 
tic word associations reflect the proximity of words in grammatical 
structures within the context of a single sentence or phrase. Sta- 
tistical word associations reflect the usage of words within the 
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context of a given document collection. This is a larger context 
than a single sentence, but a smaller context than language as a 
whole, and at this level word association is believed to reflect 
the association of words which is made in the discussion of concepts 
or topics - associations not of synonymy or of proximity, as with 
semantic or syntactic associations, but associations based on the 
fact that words are used together to express complex concepts. To 
this end, it is best applied to a homogeneous document collection 
of specific and well-defined subject scope. 

5.1.3 Determination of Statistical Associations 

The associative retrieval tools developed thus far in the lab- 
oratory are based upon statistical associations. The attempt to deter- 
mine semantic and syntactic word associations is an involved task re- 
quiring either human judgment- or rather complex computer programs. 

The determination of statistical word associations, on the other 
hand, is a straightforward task which is very well suited to the 
capabilities of the digital computer, since it requires only the 
ability to compare words and to count. Statistical association 
techniques are ideally suited to a file of indexed documents wherein 
the content of each document is represented by a set of discrete in- 
dex terms; this permits very simple matching and counting of index 
terms . 

Various methods have been developed to measure the statistical 
correlations between terms. Basically they all rely upon the count- 
ing of the number of occurrences of single terms and the number of 
co-occurrences of all possible pairs of terms. Using this data, 
quantitative measures of association between two terms can be com- 
puted. Generally, these association values are some function of! 

1) The number of documents to which each of the two 
terms under consideration have been assigned. 

2) The number of documents to which both terms have 
been assigned. 

3) The total number of documents in the collection. 

There are several measures based on these parameters which 
have been proposed in the literature. The underlying idea in many 
of these measures is to compare the number of documents actually 
indexed with both terms A and B against the number of documents one 
would expect to be indexed with both terms A and B based on the fre- 
quency of occurrence of term A in the file and the frequency of oc- 
currence of term B in the file with the assumption that terms A and 
B occur independently of one another. This may be clarified by an 
example: Suppose a file of 100 documents dealing with information 

science contains 30 documents indexed with the term SEARCHING, i.e., 
30 % of the file is indexed with SEARCHING. Assume 20 documents in 
the entire file are indexed with SEMANTIC. If SEARCHING and SEMAN- 
TIC occur independently in the file then, on the basis of their 
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individual occurrences;, we could expect 30 % of 20 or six documents 
in the file to he indexed with both SEARCHING and SEMANTIC . How- 
ever, if we observe that 19 documents in the file are indexed with 
both these terms then we may reasonably conclude that there is a 
strong relationship or "association” in this particular file between 
these two terms. 

Numeric measures of association may be computed by applying 
various statistical formulas , each corresponding to a specific mea- 
sure, to the term occurrence and co-occurrence data observed in the 
file under consideration. From these computations, tables of term 
association data may be formed that show the degree to which each 
index term used in the file is associated with other index terms. 

It is not feasible or desirable to create tables that contain all 
this word— association data; instead, such tables generally lxst for 
each term only those few index terms that have been found to be most 
highly associated with it. Again, these terms are associated be- 
cause of their frequency of co-occurrence , and not because of their 
meaning; hence, two terms may be highly associated in the tables 
and yet have no apparent similarity of meaning. 

5.1.4 Associative Retrieval 



Associative retrieval techniques attempt to overcome the diffi- 
culties which result from imperfect indexing. Using index terms to 
represent the content of a document is very common practice.. How- 
ever, it is a widely recognized fact that indexing is often inade- 
quate for various reasons: applicable terms may not be assigned 

through oversight, the indexing may be too shallow to represent. all 
the concepts in a document, and the terms assigned may be ambiguous. 

In a system which relies on a "direct match" between the terms 
given in a question and the index terms preassigned to a document, 
relevant documents may be missed because of imperfect indexing. . The 
probability of retrieving documents which are relevant to the ques- 
tion but which are not indexed with the exact terms of the question 
is increased by adding associated terms .to. the question. For exam- 
ple, consider the case where "co-occurrence" is statistically highly 
associated with "word frequency". A requestor may ask for documents 
about "word association" based on "word frequency". He may fail to 
retrieve relevant documents which are not indexed with "word fre- 
quency" but which are indexed with "co-occurrence". In associative 
retrieval, "co-occurrence" would be added to the request thereby per- 
mitting the retrieval of additional relevant documents. 

The method is not foolproof since it is based on probability 
rather than certainty. Because of this, the method has the disad- 
vantage of causing the retrieval of some non-relevant documents and 
thereby decreasing precision (i.e., the ratio of relevant retrieved 
documents to total retrieved documents). However, as indicated 
above, the method does improve recall (i.e., the. ratio of relevant 
retrieved documents to total relevant documents in the file}, and 
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it has the further advantage of being fully automatic. The associa- 
tion tables are generated automatically by the computer, and the 
query is automatically elaborated in a direction which has a high 
probability of retrieving additional relevant documents . 

In addition to improving recall and being fully automatic, 
statistical word association techniques permit the computation of 
degrees of relevance of the documents retrieved to the questions 
asked. The association between a pair of terms is usually expressed 
as a "correlation coefficient". Using these coefficients of associ- 
ation 'between terms, the machine can compute the degree of match 
between the terms of the original request and the terms assigned to 
a document. This number will reflect the "closeness" between a doc- 
ument and the request, and hence may be considered the "computed 
relevance number" for the document. When several documents are re- 
trieved in response to a request they may be ranked according to 
their "computed relevance numbers." 

The technique of document retrieval based on statistical term 
associations is well suited to on-line presentation in an informa- 
tion processing laboratory. The method displays the potential of 
the digital computer in mechanized literature searching. It displays 
the power of word association in elaborating on requests and thereby 
enhancing retrieval. Several different measures of association can 
be implemented for demonstration and comparison. Search parameters 
and association files can be varied to produce different search re- 
sults. Analysis of these results and the reasons behind them can be 
a rich field of study for users of the Laboratory. 

The specific tools relating to associative retrieval that have 
been developed during Phase I are discussed in sections 5.2 and 5*3. 

5*1.5 Teaching Associative Retrieval On-Line 

Although the details of the procedures for teaching associa- 
tive retrieval on-line have not yet been fully worked out, we can 
indicate some of their general features. First of all, of course, 
through appropriate lecture and independent reading, the student 
will have learned about the basic notions of associative retrieval 
and the various measures of statistical correlation that are used 
to measure degrees of closeness between index tags. They will have 
been introduced to the Laboratory, to terminal processing, and to 
the experimental corpus of stored documents upon which the various 
methods of intellectual access will be "exercised". They will have 
been introduced to the indexing scheme used to identify the content 
of the corpus documents and they will have been introduced to the 
various ways of posing requests to the system. 

Armed with this background and initial understanding, the stu- 
dent is given a topic, expressed in some detail in ordinary langu- 
age, which describes an information need . His problem is to 
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"translate” the description of the information need into a computer 
search specification which in turn, will select all and only those 
corpus documents that will be relevant to the stated need# In re- 
sponse to his search query the system will display the titles, or 
an abstract, or possibly the list of cited papers as well, for each 
document that is selected. After some iteration during which time 
the student has modified his query, he is satisfied with the set of 
items that the system has delivered; i.e., he has analyzed the out- 
put and he is satisfied that he has selected the best class of 
stored items . (Perhaps he has already been told exactly how many 
relevant documents are in the file relative to his search problem. ) 
We should emphasize that during this process of interaction the 
computer not only can select and display selected documents immed- 
iately in response to a request, but it can be programmed to display 
only the differences in retrieval that a modification in a query 
makes. The student can see immediately how different query formula- 
tions "cause" different classes of documents to be selected. 

The student is now ready for associative retrieval. He may 
initiate associative searching using each of the many different mea- 
sures of statistical correlation that will be available. For each^ 
particular measure he will see the consequences that follow from the 
use of that measure - He will see how associative retrieval widens 
a search and why therefore the initial query should start in a more 
narrow formulation. The student begins to develop an understanding 
of the meaning of various measures of statistical association and of 
their use in associative retrieval, by observing the behavior of the 
on-line search system, i.e., by studying the immediate consequences 

of their use. 
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5.2 Current Status 

5.2.1 Components of the Laboratory 

The major components of the Information Processing Laboratory 
upon which the on-line teaching of associative retrieve 
consist of the following: 

1) A collection of documents 

2) Files of bibliographic records stored on disc that 
correspond to the document collection 

3 ) Files of term association data on disc 

b) Computer search programs employing methods of 
associative retrieval. (See Section 5*3) 

These also provide the basic framework within which _ resear chers 
mav conduct on-line experiments in associative retrieval in the 
Laboratory. The sections that follow describe these major com- 

ponents in detail. 

To create and maintain the various data files °“ das0 ’ 
tain utility programs were written. A useful by-product of maw 
of these programs are printed listings of various data that have 
helped both students and staff in studying different elements of 
thf svstera The utility programs developed in support of the 

primaw associative retrieval software are listed in Appendix 6. 

5.2.2 Document File 

As a data base for illustrating associative retrieval, we 
selected a collection of documents that had already been assembled 
for another project at the Institute. This is a collection o 
2Bh journal articles published since 1957 that deal -^ various 
aspects of information science. There were several reasons why 

this file was attractive. First of all, xt ad £®!^ facfthat 

i„ + nf . =ff 0r t would be saved by using it. Secondly, the lact that 

it a file on information science offered two major advantages 
ovef other files that might have been considered. °ne advantage 
lav in the fact that the content of the documents would b_ of 
a^at interest to the students. In addition, this file would be 
easier for Institute staff to index in depth than would be a file 
covering subject matter with which the staff was generally unfa- 
miliar. Finally, this collection of 28h documents seemed to be 
of suitable size. On the one hand, it seemed large enough to 

adequately display the principles of “®° 0 i a * dv ®^®^nndexed 
on the other hand, it was small enough that it could be index 

and later processed by computer at a reasonable cost. 
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At the time we selected this file much bibliographic infor- 
mation about each document had already been gathered and keypunched. 
This included title, author, publisher, journal name, editor, year 
of publication, and many other items. (For a complete list of 
these items of bibliographic information see Appendix 7.) However, 
this collection had not been indexed in depth. To be of use in 
illustrating associative retrieval the documents would have to be 
indexed. 

While indexing may be carried out using the natural language 
of a document, we decided to use a controlled list of terms from 
which all indexing would be done. There were several reasons for 
this. It would reduce confusion arising from synonyms. It would 
help users formulate search requests in that the user would have 
a way of knowing in advance what terms were valid in the system 
and, therefore, to what set of index terms he was limited in posing 
requests. Finally, the use of a controlled list would concentrate 
the assignment of index tags in such a way that the co-occurrence 
data for pairs of terms would be more favorably distributed for 
the purposes of illustrating associative retrieval than would be 
the case with uncontrolled assignment of terms. To state this 
another way, indexing using the natural language of the document 
would probably result in a large number of terms, a high portion 
of which would be very lightly posted, thus yielding very low 
co-occurrence counts for nearly all pairs of terms. However, with 
a controlled list of terms, a certain portion of them would be 
expected to be rather heavily posted, resulting in rather large 
co-occurrence counts for certain term pairs . This distribution 
of co-occurrence data increases the likelihood of obtaining co- 
efficients of association of such magnitude as are required for 
the effective demonstration of associative retrieval. 

5.2.3 Subject Authority List 

At the time we decided to index from a controlled list of 
terms, no such list of information science terms was available. 

We had to form our own list. We did this by examining a; sub’s tan*- 
tial body of material in the subject field, selecting candidate 
terms, identifying synonyms and related terms, and creating an 
authority list accordingly. Our collection of 284 documents was 
sufficiently comprehensive to contain candidate descriptors from 
which a rather complete information science authority list could 
be formed. Therefore, we went through the text of all 284 documents 
selecting those words and phrases that we considered to be likely 
candidates for the authority list. Clearly a list created from 
these candidate terms would be adequate for indexing the collection 
itself; it could also be expected to be applied satisfactorily to 
other collections of information science documents. 
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In choosing candidate descriptors, no effort was made to curb 
the selection of the same term or phrase repetitively. As a result 
some 20,000 candidate descriptors were selected initially. Key- 
punching these and using the computer to weed out duplicates, the 
list was reduced to about 10,000 descriptors. The reason for this 
seemingly very large number of unique candidate terms was that there 
had been a high proportion of phrases chosen as opposed to single 
words. This list of 10,000 terms was examined' carefully, with 
close attention paid to synonyms and the relationships between 
candidate descriptors, and from it a final list of approximately 
350 terms was formed. Generally speaking, each single word or 
phrase descriptor expresses a single concept. These were- included 
because it was felt that it would be easier for users of the Lab- 
oratory to deal with a limited number of common topics as single 
terms rather than as combinations of terms. 

The list we formed to guide us in indexing has SEE and SEE 
ALSO entries as well as some amplifying SCOPE NOTES. For this 
reason we refer to the list as a "subject authority list" rather 
than simply an "index term list." It is shown in Appendix 4. 

Because of certain computing considerations the maximum length of 
each index term is l6 characters. 

5.2.4 Indexing the Collection 

Using our subject authority list, we indexed the 284 documents 
in our collection, assigning an average of 15 terms to each document, 
with a few documents being assigned as many as 50 terms. Each docu- 
ment was indexed to cover its primary and secondary topics. While 
the Index terms were assigned primarily on the basis of the content 
of the article, the words appearing in the title were also considered 
during the indexing operation. 

Several people participated in indexing the document collection. 
Recognizing that one of the weaknesses of manual indexing is the in- 
consistency that arises not only from one indexer to another but even 
within the work of one individual indexer, we held frequent meetings 
to discuss the indexing. To promote consistency some documents were 
indexed by two people independently and the results compared and dis- 
cussed with all the indexers. While we cannot vouch for the consis- 
tency and accuracy of our indexing, we believe it to be adequate for 
the present needs of the Laboratory. Appendices 4b and 4c contain 
index terms sorted on frequency of assignment and on alphabetical 
order, respectively. 
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5.2.5 Master Bibliographic Files 

1<he information about the document collection is stored in 
two distinct files on disc. One file, BIBLIO, contains all b ib» 
liographic items except the index terms. The other file, MASTERI, 
contains the index terms. Both of these files are (in the termi- 
nology of the IBM-360 Operating System) indexed sequential files 
of 80-column card images. Each document is represented by several 
card images. In each file the "key” of each 80-character record 
is the 5-character document accession number. 

The reasons for storing and maintaining this information in 
two separate files have to do with speed of searching and speed 
of display at the remote terminal. The associative retrieval 
search programs examine only the index terms assigned to a document 
to determine whether or not the document is to be retrieved in 
response to a user's query. If all bibliographic information about 
the document collection were maintained in a single file of card 
images, the search programs would spend a substantial amount of 
time reading and passing over card images that had no bearing on 
the search itself. With the volume of information we have and 
with the card format we use, we are able to search MASTERI serially 
from start to finish in one fourth the time that would be ’required 
if MASTERI and BIBLIO were merged into a single file. In addition, 
when using comparatively slow mechanical terminals (a display rate 
of 15 characters per second is typical), it is desirable to limit 
the output to a few characters per retrieved document. Otherwise 
many searches will result in such long output times that the user- 
system interaction is seriously impaired. Our search programs 
generally list only the retrieved document's accession number and 
"computed relevance number." No information from the BIBLIO file 
is needed to provide this limited output. One of the Laboratory 
search programs enables one to search for articles by author name. 
When this mode of search is called for, BIBLIO is read from start 
to finish and MASTERI is ignored. The following pages show sam- 
ples of MASTERI and BIBLIO. 
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PIG l: SAMPLE PRINTOUT OP BIBLIO fILE 



B0526C1AU1 
B052602 JOl 
B052603JA1 
BC52604MD 



SWANSON » D.R. 

LIBR Q VOL • 35 » NO • 1 

THE EVIDENCE UNDERLYING 

3 



THE CRANFIELD RESULTS 



QQ52605GQ1 JAN 



B052606YR1 
B052607PP1 
B052608C 1 1 
B052609BI1 
B053301 AU 
BQ53302 AU2 



1965 

1-20 

A72 

A87 

DYSON, 

COSSUM, 



G. M . 
W.E. 



B053303 AU3 
B A 53304AU4 
8053305 JA 
B053306JO 
B 053307 MD 
BC53308C0 



LYNCH, M.F. 

MORGAN, H.L. 

MECHANICAL MANIPULATION OF CHEMICAL STRUCTURE 
INFORM STOR RETRIEV VOL. 1, NOS. 2-3 
5 

JULY _ 



B053309YR 
BO 53310PP 
B056701 AU1 
B056702 JOl 
B056703 JA1 
B 056704 JA2 



1963 

69-99 

KLEMPNER, I.M. 

AMDOC VOL. 15, NO. 3 , llpnnulTtnu 

METHODOLOGY FOR THE COMPARATIVE ANALYSIS OF INFORMATION 
STflRAftF AND RETRIEVAL SYSTEMS.. A CRITICAL REVIEW 



B056705MD 

B056706C01 

B056707YR1 

B056708PP1 

B056709RE1 

BH56710RE2 



3 

JULY 

1964 

210-6 

A107 

B526 



B058101AU 
B058102JA 
B058103 JA2 
B058104 JO 
BC58105MD 
B05B106CO 



FELS, E.M. 

EVALUATION OF THE PERFORMANCE OF 
SYSTEM BY MODIFIED MOOERS PLAN 
AMDOC VOL. 14, NO. 1 
3 

JAN 



AN INFORMATION-RETRIEVAL 



B058107YR 
B058108PP 
B058109RE 
B058110RE2 
B060101 AU 



8060103 JO 
B060104MD 
B060105CQ 
B060106YR 
B060107PP 
B060108RE 
*fft)e*C91AUl 
B063802 JOl 
8063803 JA1 
B0638C4MD 
B0638C5C01 
B 063fl06YR_l 
B 063 807 P PI 



1963 
28-34 
A 106 
Alll 

JOHNSON, L.R. 

AN INDIRE CT CHAINING, 
C ACM V0L.4, NO. 5 
19 
MAY 
1961 
218-22 
A 80 



METHOD FOR ADDRESS ING ON SECONDARY KEYS. 



BROWNSON, H.L. 

SCIENCE VOL. 132, NO. 3444 

RESEARCH ON HANDLING SCIENTIFIC INFORMATION 
1 

DEC 

im 



1922-31 



Columns 7 and 8 are the code identifiers of the document attributes. See Appendix 7- 
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SAMPLE PRINTOUT 


OF MASTERI FILE 
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SYSTEM 
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TIME-SHARING 
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MATCH 
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INFO. RETRIEVAL 
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TRANSLATION 
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AO 13402LD INTERPRET 
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COMPUTER 
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NOTATION 

SEMANTIC 

TRANSLATION 


CONFERENCE 
MATHEMATICAL 
PROG. LANGUAGE 
SOFTWARE 
USER 


FRRDR 

NATURAL LANGUAGE 

PROGRAM 

SYNTAX 

WORD 


A01350ILDACCESS 
AO 13502LD00CUHENT 
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A013504L0RESEARCH 
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FLOW OF INFO. 
LIBRARY 
SCIENTIFIC 
TRANSMISSION 


CENTERS 

GENERAL 
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SEARCHING 


CIRCULATION 
INFO. RETRIEVAL 
REMOTE TERMINAL 
SERVICE 


AO 13601LDACQU IS ITION 
A013602LDL IBR ARY 
A013603L0RETR I EVAL 


ANALYSIS 

MEASURF 

SERVICE 


CIRCULATION 

MEETING 

SYSTEM 


COMMUNICATION 

PATTERN 


A013701L0ACCESS ION NUMBER 
A013702LDRETR I EVAL 


BOOK 

SIZE 


CLASSIFICATION 

SUBJECT 


LIBRARY 


BOO 1201LDAUTO ABSTRACTING 
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B001203LDTRANSL AVION 
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NATURAL 
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STORAGE 


LANGUAGE 

SYSTEM 


BOO l 30 ILDABSTR ACTING 
BOO 1302LD0ICT IONARY 
B001303LDL IBRARY 


ASSOCIATION 

FREQUENCY 

LITERATURE 


CLASSIFICATION 

1N0EX 

MICROFILM 


DATA 

INFORMATION 

NETWORK 


BOO 1401LDD0CU WENT 
B001402LDSCANN ING 


INOEXING 

STORAGE 


INFO. RETRIEVAL 
TERM 


MICROFILM 

TRANSLATION 


BOO 1501LDAUTOHATION 
BOO 1502L0 INFO# RFTRIFVAL 
R001503LDQUEST I CN 


COMMUNICATION 

INFORMATION 

RETRIEVAL 


DISSEMINATION 

INPUT 

SIGNIFICANCE 


DOCUMENT 

OUTPUT 

THESAURUS 
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5.2.6 Term Association Files 

As discussed earlier in this report, associative retrieval 
based upon statistical term associations involves the formation 
of tables of association data, one table for each measure repre- 
sented. During Phase I we have created three files of word asso- 
ciation data. The three measures we used were suggested in a 
paper by J. L. Kuhns entitled: ’’The Continuum of Coefficients of 

Association. "I Without going into a detailed mathematical devel- 
opment, let us discuss the underlying notion behind these measures. 

If two terms used to index a collection of documents occur 
independently in the collection (i.e., if they are both randomly 
distributed throughout the file), one may calculate, based on their 
individual frequencies of occurrence, the number of documents in 
which one might expect the two terms to co-occur. This expected 
number of co-occurrences may be called the "independence value 
since it is based upon the assumption that the two terms occur 
independently in the file. To test the validity of this assump- 
tion, one may observe the actual number of documents to which both 
descriptors have been assigned and compare this with the indepen- 
dence value. This difference may be called the "excess over the 
independence value." This quantity, depending upon the statistics 
of a particular collection, may range from large positive integers 
to large negative integers . To provide a common basis for compar- 
ison and calculation involving different pairs of index terms, some 
sort of normalizing factor is needed to bring the coefficient of 
association between any pair of terms, regardless of their occur- 
rence patterns, into the range from -1 to +1. The measures pro- 
posed by Kuhns which we have used are all of the form: 



The formulae for computing these coefficients of association are: 



where X = the number of documents to which both terms have been 
assigned 

n. = the number of documents to which term i has been assigned 

n. = the number of documents to which term j has been assigned 

N = the total number of documents in the entire collection 



lln : Statistical Association Methods for Mechanized Documentation , 

Symposium Proceedings , Washington, D.C., 1964, pp. 33-39. In the 
notation of that paper the measures we chose are designated W, G, 
and S. 



excess over independence value /normal! zing factor 






min [n. 9 n ] 

X J 



N/2 
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Notice that all these measures are symmetric, i.e., the coefficient 
of association between terms i andj is the same as the coefficient 
of association between terms j andi . Also note that all three 
formulae yield a coefficient of association between a term and 
itself that is not equal to unity. This seems to contradict a 
basis notion of matching, i.e., that a term coincides in meaning 
with itself and is not merely "close” to itself in meaning. In 
building the tables of word association data used with the search 
programs we have overridden this characteristic of the formulae » 

In our tables the coefficient of association of every index terra 
with itself equals unity. 

Another characteristic of these formulae is this. The denom- 
inator of measure S is constant and equal to half the number of 
documents in the collection. The denominators of the other mea- 
sures are an order of magnitude smaller than this. Thus coeffi- 
ents of association based on measures W and G are generally much 
larger than those of measure S and, therefore, with certain search 
methods are much better suited for demonstrating associative re- 
trieval. 

Experimentation using measure S reveals something that might 
easily be missed when reading a description of various measures. 
Because the denominator of measure S is a constant, it serves as 
a normalizing factor only in that it prevents the magnitude of 
the coefficients from exceeding unity. It does not adjust the 
coefficients in such a way as to give all pairs of index terms an 
opportunity of having a high coefficient of association. Since 
the denominator of measure S is a constant, the relative size of 
two different coefficients will be determined entirely by the 
numerator. The numerator is dominated by the absolute number of 
co-occurrences of two terms. Two heavily posted terms have a much 
greater probability of co-occurring frequently than do two lightly 
posted terms , This means that the index terms that measure S finds 
highly associated with a heavily posted term will, themselves , be 
heavily posted. Thus, when a user’s request involving a frequently 
assigned term is e:xpanded according to measure S, several other 
heavily used terms are considered. This results in the retrieval 
of many more documents than if W or G were used. 

Typical use of measure S does produce this result, and this 
is a good example of how the Laboratory allows a better under- 
standing through active, on-line experimentation. It prompts a 
student to examine the nature of this measure whereas he might 
well overlook this analysis without the "hands-on" experience in 
the Laboratory. 

To actually generate term association files, computer pro- 
grams were written that counted the single occurrences of individ- 
ual descriptors and the co-occurrences of pairs of terms. Applying 
specific association measures to this data, the coefficients of 
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association for all possible pairs of terms were calculated. Then 
a disc file was Written corresponding to each measure. These files 
list for every index term the four other descriptors most highly 
associated with that term according to the specific formula used. 

The respective coefficients are also stored in the file. A portion 
of one of the association files is shown on the following page. 

The word association data files are indexed sequential files of 
logical records of 100 characters each. Each logical record con- 
sists of five fields, one for the primary term and each of its 
four associated terms. Each field is 20 characters long, 1 6 char- 
acters for the name of the term and 4 characters for the coefficient 
of association. The "key*' field is the 1 6 character name of the 
term. These logical records are read directly as soon as individual 
request terms submitted by the user are encountered. 










FIG 3 



SAMPLE PRINTOUT OF WORD ASSOCIATION FILE (KUHNS W) 



SEARCHING 
SELECT IUN 
SELECT 1 VE 
SE KANT IC 
SEQUENCE 

SLBYlLE. 

SFT 

SET THEORY 
SETS 

SIGNIFICANCE 

SIMULATION 

SUt. 

SOCIAL IMPLIC. 
SOFTWARE 
SORTING 
SPECIALIZED 

specificity 
si anda.ru izat i ca 

ISTAT 



9999 CK 1 TICAL 
S 999 AUNINI STKAIION 
CISSEM 9999 PrtyFlLt 
9999 ukI T ICAL 
9999 AUT HUR I T Y LIST 
. 93 J 39 ^al-dU&llY L 1 ST_ 



8099 MOO IF ICATION 
9 S 60 CONTRnL 
986 QUP 0 ATING 
8557 RELATED 
454 JOB JECT IVE 
950 B GOVERNMENT 



80990 PT IN 12 ATI 0 N 
9860 INVENT ORV 



8099 U MOAT INC CO* 1 ® 

4860 NACHINS-REAOA 6 LE 4660 



9860 CURRFNT AUARENES 7255 DISSFNINAT ION 
8557 CRQSS-REFERENCE 6557 FACET 
4543 I 0 ENT IF ICATION 4 S 43 FACET 
950BC0NT-FPTS aifligiimiAt 



5999SYH80LIC LQuIC 
9999THtOKY 
9999uRuSS-RtFtRtNCE 
S999PKuMLt 
99S9 STaNl)AK0IZA I ION 
S 999 CLAS 3 IFIuAIIUN- 
S999uaJtCTI Vt 
5999CuMP0TcK 
9999S I OK Act 
9999L I BKAH I AN 
5999AJM1NI STKATI ON 

9999 CJNVENT 1 UNAJL 

ASSOC I AT I CN9999PS YCHULUGY 
STAT. METHOD 9999 VAL I OAT I ON 

STATE-OF-TFf -AK T S999C RUSS-KtF ERtNCE 
STATISTICAL 99991 NUtPENOtN T 

STORAGE 9999MICKUE ICHt 

SIJLLNG 9999 CAU. NUA&fcll 

STROCTORE 99990b JtC I I V t 

SUBJECT S999A0T HUkI TY LIST 

SUBJECT HEADING 9999STANl)AKuIZaTI 0N 
SUBJECT INDEXlNG9959THtSAUKUS 
SUKJFCT-CATAI CC .5 999 ASSOC IA T luN 
SLMHARY . 



4860BI NARY 
SMOOFACT RETRIEVAL 
9684EVALUATE 
9789UPDATING 
9860FILE 
9 1 80RET R 1 Etf AL 



4720 CHARACTER 1 ST IC 47200 NCANIJAT ION 
49 3 GOEOOCT IVE 4845 INFERENCE 

9684 NON-RANOON 96 C 4 ASSIGNE 0 

9789 SELECT IVE 01 SSEM 4789 SUNNARV 
4230 C 0 NVENT lONAL 3 193 ANAL VS IS 

7795 MACHlNE-RE 6 DAflLE 443 QmM 6 



7115 

6557 

3563 

9600 

4720 

4720 



99 TOPHI LOSOPHY 

5085 SYMPOSIUM 

T 571 DATA 

6211 LI 8 RARY 

97 I 9 C 0 NTR 0 L 

9895 SI MULAT 1 UN 



9895 PERF 0 R NANCE 
4825 REDUN 0 ANCV 
49081 OENT IF ICATION 
565 IQUE ST I ON 
97 I 9 EFFECT IVENESS 
9 B 60 CA TAL 0 G 1 NG 



9825 TECHNOLOGY 

4790 INTERPRET 

4824 OR 0 ER 

5546 ANALVSIS 

4719 INVENT 0 RV 

9755QESCtlPTlVE 



4 T 64 

31C0 

4930 

9755 

4755 

3542 

4646 

4714 

9T55 



9648 HI STORICAL 
9261 HUMAN INDEXING 
9684 A 0 MI N I STHATION 
9 C 50 MOUIFICATION 
7 57 l PROGRAMMED 



7 I 48 AUTH 0 RITY LIST 
926 1 ST AT . ANALYSIS 
9684 C 0 NTR 0 L 
90500 PTI MI ZATI ON 
757 IPUNCTUATI ON 



9685NQN-01SCR 1M1NA NT 96B4N0N-FILE 



4648 F REOUFNCY 
9261 EVALUATE 
9684 ALPHABET IC 
9050 ATTRI 8 UTE 
7571 RE AL-TIME 
94 A 4 MUM 6 EB 



3768 
9261 
OROER 9664 
9050 
7571 
4664 



8170 SOCIAL IMPLIC. 8 170 RE AL-T I NE 
9402 CUNCEPTS 9402 PRINTINC 

944801 GIT AL COMPUTER 9648 EVALUATE 
4944 SELECT ION 4824 LANGUACE 

5 196 AUTH 0 RI TY LIST 4895 ANAL 0 GY 
SS99HCRU A4 silt, i AT I ON9l?5DnCUMENT 7 3 4QPROF 1LE — 



81 70 RESPONSE TINE 
9402 F ACET 
9648 N 0 N-RAN 00 N 
43801 NOEXlNC 
4895 SVSTFM 
A&lflCflMCQ&flAMCE 



8170 

5402 

9648 

3762 

3746 

4645 



SURVl Y 
SYMBOL 
SYMBOLIC 
SYNONYM 
SYNTACTIC 
SYNTAX 
SYSTEM 
TABLE 
TAG 

TECHNICAL 
TFCHNCL CGY 



LOGIC 



ANAL i 



S 959 V.RI TICAL 
S 999 CKI T I CAL 
S 999 URGSS-REFEKENCE 
S 999 R tL AT ED 
S 999 GtNtHA I ION 
<.s5sST mNuARU1/.AHQN 



9578 EXTRACT 
922608 J 6 CT I VE 
9 B 60 L 0 GIC 
94 C 2 CCNC 0 R DANCE 
6 C 68 PARSE 
-Jfll SilFDUCT I Vfc 



4578 TVPE-SETT ING 4578 SYNTACT 1 C ANAL. 
922 &S 0 CIAL IMPLIC. 9226 CR 0 SS-REFERENCE 
6765 SE 7 4 B 60 AL 6 E 6 RA 

6 G 68 ASSIGNE 0 4402 SC 0 PE NOTE 

4786 C 0 MP L INGUI STICS 4402 SURVEY 
568 1 FACT RFTH 1 FVA 1 4 Q 15 SET THFQRV 



5999 AUT HUM 1 1 Y LIST 
9999 UJCUMENT 
S 4 S 9 HUMAN lNDtXlNG 
9999 AT TR Id Of t 
9999 S 0 C I AL IMPLIC. 
T FLt GRA P HIC ABS . 9999 CKH SS-REE thkNCE 
TERM 
IERMS 
TEST 
TEXT 
THEORY 
THESAUR US 
TIME 

T IME-SHARING 
TITLE 

transformation 

TRANSLATION 
TKAM5M1SS1CN 

tree 

TREE STRUCTURE 
TYPE-SETTING 
UNITERM SYSTEM 
UPDATING 

ilTJiil - 

UT I V I T Y 
VALi’UA TION 
VALl.E 
VAR I ADI E 
VECTOR 

VFNN OMGRAM 



71 1 3 ST ANOAROIZA? I ON 

7324 PR 0 FILE 

9437 ST AT • ANALYSIS 

9261 PR 0 CESS 

9 T 54 GRAPHICS 

98951 NQFX 1 Nr. 



71 I 3 MICR 0 F ICHE 
4824 SUMMARY 
9437 C 0 NCEPTS 
9261 ACCIDACY 
975461 8 L I OGRA PHY 
7795ST0RAGE 



7 II 3 UPOATING 
48 24 INFORM AT I ON 
9437 PR INT ING 
5927 KEVPUNCH 
5152 SCIENTI FIC 

75 B 5 PFBT 1 NFMT 



99991 NJE PENDENT 
999<>RtLAT 1 Vt 
9999EVALUAT t 
9999 ST AUUARl) I ZA 1 1 JN 
5999CR 1 1 1 CAL 

9299 JtANK 

9999CUST 
9999RE AL-T 1 ML 
9999CALL NUMBER 
9999CR1 TlCAL 
S999RELATEU 
. 5Q99F 1 IU l IF 1 NE U. 



9226 SEARCH 9226 IN 0 EX 

9 790 COOROI NATE I NOE X 9405 GRAPH 
94 3 7 NON- RANDOM 9437 C 0 NCEPTS 

9296 STAT. ANALYSIS 9296 FVALUATE 
9296 SET THEORY 9296 RESP 0 NSF 

B 944 RFL ATEP 8944 FVAI UATE 



TIME 



4639 VALI 0 ATI 0 N 

9405 NEIGHT 

9437 PRINTING 

9296N0N-RAN00M 

9296 CR 0 SS-REFERENCE 

89 AANtlN — 8 ANDQIi 



4402 

9224 

4665 

4402 

4402 

7113 

3923 

9437 

5927 

5011 

6441 

4224 

9405 

9437 

9246 

4246 



5472ACCESS 5436TYPE-SETT ING 

9789ACCESS 4437ST0RAGE 

9472N0N-0I SCRI Mil NANT9472N0N-FI LE 
933IEXTRACT 433ISET 

8909CUMP L INGUI ST ICS5575C0NFERENCE 
449 IRE MOTE TERM1MAI 645ABFSFAB CH 



4824LARGE 
4240 ALGORITHM 

94 72 NUMBER 
4331 ART IFIC IAL 
5575NUMERIC 

41745 f IFMTiFir 



4624 
4015 
9472 
INTEI G331 
5975 
5466. 



S944KESPUNSL TIME 
SS99ST AT • ANALYSIS 
S599UHAPHICS 
S599E VALUA T E 
SS99PRGF I LE 

5599NEEUS 

9999S YSTEM 
9999C0LLECT IUN 
9999RESP0NSE TIME 
9999 A ITRlbUTE 
99991 NDEPENUENI 
9Q99I OUtCAl 



9402D IGI TAL C0MPUTER9402FI LE ORGANIZATI04402LARGE 
9684QUAL ITAT IVE 4684SC0PE MOTE 

9930MECHANIZAT I ON 9475CLERIC AL 

9824N0N-RAN00M 98241 NOEX ING 

9930SELECTIVE 0 1 SSEM9860NE IGHT INDEXING 
8874SUC 1 6i IMPLIC- 8R74P1AMNIMG 



4402 

4684PREDICT ION 3664 
4S95N ICROF ICHE 4895 
576 2CGUM0I NATE IN0EX5401 
9825SIGNIEICANCE 9740 
PP741IPBATIN6 6616 



713QINF0. RETRIEVAL 
96I5STAT. METHOO 
96I3PR0F ILE 
9578ITERATIVE 
9613ITERATIVE 
6561 NETWORK 



4595ACCESS 4440EFFICIENCV 

92 65CL ASS I FIC AT I ON 6I60SAMPIE 
4M3SELECTIVE 01 SSEM4613SUMMARV 



957BQU4NT ITAT IVE 
961 MATRIX 
AWACORRFI ATT ON 



45T8ASSIGNE0 
54C0C0UNT 
4t06ftFi ATT0M5MIP 



4335 

4660 

4613 

4576 

4613 

5416 



5 . 3 On-Line Retrieval 

The task of a document retrieval system is to select from a 
file those documents , and only those documents , which are relevant 
to a user's request. A user approaches the system with a question 
expressed in natural language (the language of ordinary speech) . 

He must first translate his question into a language and form ac- 
ceptable to the system. This initial step, known as "question anal- 
ysis" or "query formulation" involves the translation of the con- 
cepts of the question into the indexing language used by the system. 
Indexing and query formulation are symmetrical processes; both trans- 
late natural language concepts into a set of terms from the thesau- 
rus. Retrieval is then a matching of term sets. The query is in- 
put to the system, the search routines are executed, and output from 
the system is a list of document numbers purportedly relevant to the 
original question. 

In the Information Processing Laboratory, retrieval takes .place 
on-line. The user types in his request at the remote terminal. A 
record containing a document number with its index terms is read into 
the computer from the MASTERI file (on disc) and the terms are com- 
pared to the terms of the quexy. Where the terms match to a certain 
specified criterion, the document will be retrieved and its number 
will be typed at the terminal. 

To provide a variety of ways to pose requests and thereby pro- 
vide different approaches to conducting experiments in associative 
retrieval, three search programs have been developed. In all three 
programs the user may invoke any of the three word association files , 
or, if he wishes, he may ignore these files and search in "direct 
match" mode. 

The three different search programs each use a different type 
of question format as input. These formats represent different 
levels of complexity in posing questions. By offering this range 
of choice in question formats, the system provides a flexibility 
which is highly desirable for its purpose of serving users with 
varying degrees of sophistication. The user , whether novice or 
specialist, can choose that method of posing questions which best 
suits his level of knowledge and his needs. The ability to choose 
different question formats also provides the opportunity for exper- 
imentation; a student may express the same essential question in. 
different for mat s in order to determine the impact of varying this 
parameter. While varying the form of the question, he may also vary 
his search strategy and the association files he uses, and thus dis- 
cover the optimal combinations of different approaches. 

5.3.1 Search Program No. 1 

In the first search program, designated LABSRC 1 (for LAB SeaRCh 
program l), the user expresses his question in the form of a single 
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list of terms. The terms are logically independent and are not struc- 
tured into a Boolean expression - the logical operators "and,” "or," 
and "not" are not used. An important parameter of LABSRC 1 is the 
"minimum value" or "min- j, 1" which a document must have in order for 
it to be retrieved. The user specifies this "minval" when he inputs 
his query. 

In the direct match mode this minimum value is simply the number 
of query terms that must have been assigned to a document in order 
for it to be retrieved. For example, if a user made this request, 

application 

list 

structure 
natural language 
translation 
minval = OH . 00 

he is specifying that at least four of the five terms of the question 
must have been assigned to any document that is retrieved. He cannot 
specify which four must be present; any document with less than four 
will not be retrieved. 

When he performs a search using the word association files, the 
coefficients of association play a role in determining a document's 
"value" or "weight". Each query term present in the document contri- 
butes a weight of 1.0, just as in the direct match mode. When a 
query term is found to be absent from the document's list of assigned 
terms, the four terms associated with the missing query term (as 
found in the specified word association file) are matched against the 
document. If one of these associated terms is assigned to the docu- 
ment, its coefficient of association will be added to the retrieval 
weight of that document. If more than one of the four associated 
terms is assigned to the document, the one having the largest coef- 
ficient of association will contribute this value to the retrieval 
weight of the document. This retrieval weight is the sum of the 
values of the individual terms , terms present in the original query 
contributing a value of 1.0, and associated terms contributing the 
value of their coefficients. After all of the query terms have been 
matched against the document in this way, the document's retrieval 
weight is compared to the minimum value specified by the user. If 
the document's weight is greater than or equal to the minimum value, 
the document is retrieved; if not, the document is rejected. 

For example, suppose a document is indexed with the terns "list," 
"structure" and "natural language." It would not satisfy the request 
above if the search was performed in direct match mode, since it has 
only three of the query terms. However, .if we use word association 
from the Kuhns W file, the document is retrieved because it has the 
term "program" associated with the query term "application" (with 
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coefficient of .5616), and the term "comp, linguistics" associated 
with the query term "translation" (with a coefficient of .5575)* By 
adding the values of the query terms (a total of 3.00) and the asso- 
ciated ter ms (.5616 + . 5575)j ws reach a total of 4.11, which is 
above the "minval" of 4.00. This document is therefore retrieved. 

When searching in direct match mode, the user specifies, a 'min- 
val" which is an integer since it represents the minimum number of 
terms which a document must have for retrieval. When searching in 
the associative search mode, he specifies a "minval" which can be 
either an integer or a decimal fraction, such as 3.5 or 4.78. In 
this mode he is specifying the minimum "weight" which a document must 
have for retrieval. The number of query terms and the coefficients 
of association for associated terms determine a document's retrieval 
weight, and the user's "minval" determines which documents should be 
retrieved. Thus, the coefficients of association play a role in 
selecting or rejecting documents in LABSRC 1. 

The retrieval weight of a document can be interpreted as its 
"computed relevance number" since it represents the degree of match 
between the document and the request . 

A sample printout from the 2740 terminal illustrating the oper- 
ation of LABSRC 1 is given on the next page. 



-43- 





FIG 4: SAMPLE RUN FROM LABSRC 1 



TERMINAL CLEAR 
labsrcl 

WILL YOU SEARCH ON INDEX TERMS? 
yes 

SELECT WORD ASSOCIATION FILE 
kuhnsw 

DO YOU WANT OPERATING INSTRUCTIONS? 
yes 

ENTER LIST OF TERMS , ONE PER LINE. ENTER 'MINVAL' AS; 
MINVAL=XX.XX OR MINVAL=XX.XX* (*MEANS WORD ASSOCIATION 
DATA WILL BE IGNORED). MINVAL ENTRY LINE MAY BE ANYWHERE. 
LAST ENTRY MUST BE 'END'. 

application 

list 

structure 
natural language 
translation 
minval=04 . 00* 
end 

FILE IS NOW BEING SEARCHED 
B0260; VALUE=04 • 00 
END OF SEARCH 

WILL YOU SEARCH ON INDEX TERMS? 
yes 

SELECT WORD ASSOCIATION FILE 
kuhnsw 

DO YOU WANT OPERATING INSTRUCTIONS? 
no 

application 

list 

structure 
natural language 
translation 
minval=04 . 00 
end 

FILE IS NOW BEING SEARCHED 
A0004: VALUE=04 . 1 1 
B0260; VALUE=04 • 00 
B0638 : VALUE=04.20 
END OF SEARCH 



o 
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5.3.2 Search Program No. 2 

A second search program, LABSRC 2 (LAB SeaRCh program 2), pro- 
vides the user with some degree of logical flexibility in posing 
questions. He is able to specify certain terms that must be present 
in a document, others that must not be present, and groups of alter- 
native terms any one of which must be present. In this program, the 
user enters his' query as a set of term lists which are preceded by 
one of the logical operators "and”, "or", and ’’not , These lists ^ 
may be thought of as question fragments. There is an implied "and" 
operator joining the logical query fragments represented by each 
list . 



LABSRC 2 differs from LABSRC 1 in two major ways. In the form 
of the query, LABSRC 2 permits the use of several term lists with 
logical operators rather than a single term list without logical op- 
erators. In retrieval LABSRC 2 depends on satisfying the several 
conditions of the retrieval prescription, rather than depending on 
the matching of a document's weight with the user's minimum value. 

A typical query for LABSRC 2 is the following: 

AND 

info retrieval 
experiment 

OR 

recall 

precision 

OR 

design 

evaluation 



NOT 



cost 



In other words, the user wants only those documents indexed with 
both the terms 'info retrieval' and experiment , plus either 
call’ or 'precision', plus either 'design' or evaluation , and the 
user does not want documents indexed with 'cost . 

The requestor may submit one AND list, one NOT list , and up to 
four OR lists. A maximum of 10 terms may appear m the , 

list, a maximum of 5 terms in the HOT list, and a maximum o 5 
in each of the 0E lists. He may search in direct match mode, or he 
may invoke word association from one of the three term association 

files . 

If a document is to be retrieved, it must satisfy for each of 
the three types of lists either of the two following sets of condi- 
tions (a and b) depending on whether the search is performed in di- 
rect match mode or in associative mode: 



AND a) Direct match : All the terms in the AND list must be 

present in the document . 

b) Associative: If one of the terms of the AND list is 

absent^ one of its associated terms must be present. 

OR a) Direct match : At least one of the terms in each OR 

list must be present. 

b) As sociative : If none of the terms of an OR list is 

present in the document, one of the terms associated 
with one of the terms of the list must be present. 

NOT a) Direct match : The document must not have any of the 

terms of the NOT list. 

b) Associative : The association file is not used when 

processing the NOT list; it is always processed in 
direct match mode. A document fails to satisfy the 
NOT list requirement only if one of the terms sub- 
mitted by the user is present; there is no reference 
to the word association files. The presence in a 
document of a term that is highly associated with a 
term in the NOT list is not deemed sufficient cause 
for the document to be rejected. 

In sum, in using LABSRC 2 in the direct match mode, a document 
will be rejected if any term in the user's AND list is missing from 
the document, if all the terms in any of the user's OR lists are 
missing, or if any tern in the user's NOT list is present. In asso- 
ciative retrieval mode, the same rules hold except that a term in 
the association file may be substituted for a missing term in the 
AND and OR lists. Unlike LAESRC 1, it is not the "weight" of a doc- 
ument which determines whether or not it will be retrieved but simply 
the presence or absence of query and associated terms. The user doe.-’ 
not set a "min imu m value;" all documents which satisfy the retrieval 
conditions are retrieved. 

However, the "retrieval weight" of a document is used in LABSRC 
2 as a "computed relevance number." The computed relevance number 
does not influence which documents will be retrieved; it is merely 
displayed for the user's information. In associative retrieval mode 
LABSRC 2 calculates a. weight of each list or query fragment in the 
following way: each term in the AND-list that is present in the doc- 

ument is assigned a weight of 1.0. If the query term is absent but 
an associated term is present, its coefficient of association is as- 
signed as the weight for tn .-*• term. The cumulative weight for the 
AND-list is the product of the weights of the individual terms. If 
a term in an OR-list query is present in the document, the OR-list 
is assigned a weight of 1.0. If no OR-list term is present, the OR- 
list is given a weight equal to the coefficient of association of 
that term in the document that is most highly associated with one of 
the terms in the OR-list. The NOT list does not play a part in the 
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calculation. The weights of the AND-list and each OR-list are. 
multiplied to yield a "computed relevance number" for the retrieved 
document that is displayed to the user along with, the document ac- 
cession number. In direct match mode LABSRC 2 merely lists the 
accession number of the retrieved documents; there is no computed 
relevance number" since all retrieved documents have identical num- 
bers, their retrieval being dependent on the same prescription of 
required query terms . This page contains a sample question in 
LABSRC 2, followed by a sample computer run. 

SAMPLE LABSRC 2 QUESTION 

Assume a document is indexed with the terms: 

EDUCATION 

PLANNING 

LABORATORY 

ON-LINE 

BIBLIOGRAPHY 

SERVICE 



Further assume: 

RESEARCH is associated with LABORATORY by (.8) 
REMOTE TERMINAL is associated with ON-LINE by (-9) 
LIBRARY is associated with BIBLIOGRAPHY by (.6) 
LIBRARIAN is associated with SERVICE by (.7) 

Assume a query composed of these 5 lists: 



AND 

EDUCATION 

PLANNING 

RESEARCH 



COMPUTER 

NETWORK 

REMOTE TERMINAL 



LIBRARY 

LIBRARIAN 



This document's relevance number would be completed as follows: 

Weight of AND list: 

EDUCATION is present 
PLANNING is present 

RESEARCH is absent but LABORATORY is present 
AND list = 1.0 x 1.0 x .8 

Weight of first OR list: 

No term submitted by user is present but 
REMOTE TERMINAL is highly associated with ON-LINE 
first OR list ” 

Weight of second OR list: 

LIBRARY is absent but BIBLIOGRAPHY is present 
LIBRARIAN is absent but SERVICE is present 

weight “ max ( . 6 , . 7 ) = 



= 1.0 
a 1.0 
= .8 
= .8 



The relevance number of the document is the product of the weights 
of the individual lists, i.e., (.8) x (.9) x (*7) = *504 
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PIG 5 : SAMPLE RUN FROM LABSRC 2 



■* 



TERMINAL CLEAR 
labsrc2 

THIS IS LAB SEARCH PROGRAM 2 

JO YOU WANT TO USE WORD ASSOcI. .xIMI DATA? 
no 

FLEASE ENTER YOUR REQUEST 
and 

info, rebrieval 

experiment 

or 

recall 

precision 

or 

design 

evaluation 

not 

cost 

end 



PILE NOW BEING SEARCHED 
JOC. NU. 13ER=A 0072 
DOC. NUMBER=A0037 
END OF SEARCH 

PLEASE SPECIFY RESTART OR EXIT 
restart 

JO YOU WANT TO USE WORD ASSOCIATION DATA? 
yes 

WHICH ASSOCIATION FILE DO YOU WANT TO USE? 
kuhnsg 

PLEASE ENTER YOUR REQUEST 
and 

info, retrieval 

experiment 

or 

recall 

precision 

or 

design 

evaluation 

not 

cost 

end 

FILE NOW BEING SEARCHED 



DOC. NUMBER-A0003 


COEFF-O.OO 


JOC. NUMBER=A0019 


COEFF=0 .17 


DOC. NUi 1DER=*A 0072 


COEFF*1 .00 


JOC. NUMBER=A0073 


COEFF=0 .19 


DOC. NUMLER-A0087 


COErF*1 .00 


JOC. NUMBER=.i01 07 


COEFF-O . 32 


^ND OF SEARCH 


-1*0- 



5.3.3 Search Program No. 3 

The final search program based on associative retrieval, LABSRC 
3, is by far the most sophisticated of the three. The most distin- 
guishing feature of the program is the dynamic interaction permitted 

between system and user. It provides the user complete logical flexi- 
bility in posing requests. Any logically valid combination of index 
tags and operators may be used. LABSRC 3 permits any level of paren- 
thetic nesting in phrasing questions. With this program the user 
may emphasize certain elements of his request by assigning weights to 
individual terms or to parenthetic fragments of his question. Just 
as with the other search programs, LABSRC 3 may be used in either the 
direct match mode or associative retrieval mode. In the latter mode, 
LABSRC 3 can assign a computed relevance number to each retrieved 
document that can be used to rank the documents according to the 
closeness of match between document and request as determined by 
the system. 

The interaction between system and user may take many forms. 
Among these are the following: the user may modify his request in 

a variety of ways; he may exercise some control over the extent to 
which his question will be expanded by the use of term association 
data; he may choose to display only a selected subset of the re- 
trieved documents; he may alter the normal flow of control through 
the program. This interactive capability is achieved through the 
use of a special " command language" within LABSRC 3* A full descrip- 
tion of the command language, along with other detailed information 
about LA3SRC 3, is given in Appendix 5. 

Without discussing the inner workings of LABSRC 3 in great de- 
tail, let us describe in a general fashion the way this search pro- 
gram operates. If the user does not invoke any options of the com- 
mand language, LABSRC 3 will proceed in a "normal pass" by asking 
the following questions, each of which may be regarded as a major 
junction point in the program: 

1. DO YOU WANT WORD ASSOCIATION? 

2. PLEASE SPECIFY ASSOCIATION FILE. 

(Only applies to associative search) 

3. DO YOU WANT SCORING?* 

(only applies to associative search) 

H. ENTER BOOLEAN EXPRESSION. 

5. DO YOU WANT RESULTS PRINTED? 

6. SPECIFY RESTART OR EXIT. 



*Scoring = computation of relevance numbers . 
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In a normal run the only options offered by LABSRC 3 that are 
not available in the other search programs are the options to sup- 
press scoring and to choose how much output is to be displayed. Of 
course, the format used in entering the question is much different. 

A question posed to LABSRC 3 might take this form: 

'RETRIEVAL 1 AND ’EFFECTIVENESS' AND ('RECALL' OR 'PRECISION') 

AND NOT 'COST' 

Searcning in direct match mode proceeds as in the other search 
programs; terms required by the question must be assigned to a docu- 
ment for it to be retrieved. Scoring does not apply when using the 
direct match mode. When scoring is called for in associative re- 
trieval mode, relevance numbers are computed in the same manner as 
in LABSRC 2. 

LABSRC 3 permits the user to assign weights to individual terms 
or parenthetic fragments of his question. In this way he can empha- 
size the importance of certain elements of the query. The weights 
are entered as decimal fractions in the range .0000 to .9999* 
scoring, the assigned weights become multiplying factors in deter- 
mining the computed relevance number. It is in this way that the 
relevance measure of a retrieved document reflects the closeness of 
the document to the user's weighted request. 

In the example question above no weights are assigned. In this 
case every term and parenthetic expression bears an implied weight 
of 1.0. 



The power of LABSRC 3 lies in its capacity for interaction with 
the user. At any point other than where the user is asked to enter 
his request , one may reply with a command that alters the normal 
flow of the program. The command may be submitted in natural langu- 
age ; the Laboratory user need not learn a great number of special 
keywords and formats to use the command language of LABSRC 3» LAB- 
SRC 3 has a text analyzer that examines commands entered by the user 
and identifies certain verbs and keywords to determine what command 
is to be initiated. 

Appendix 5 lists all commands available within LABSRC 3. We 
cite only a few of them here to illustrate how a sequence of inter- 
active operations initiated by the user might proceed. 

Suppose after processing the above question ('RETRIEVAL 5 AND 
'EFFECTIVENESS' AND ('RECALL' OR 'PRECISION') AND NOT 'COST') in a 
normal pass the user wishes to de-emphasize the term ’EFFECTIVENESS'. 
He may do this by responding to SPECIFY RESTART OR EXIT with the 
command: 

ASSIGN .7000 TO 'EFFECTIVENESS' 



o 
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In scoring, the weight of EFFECTIVENESS (or an associated term) will 
he multiplied by .7. This will result in a lower ranking for those 
documents that formerly had a high relevance ranking due to the pre- 
sence of EFFECTIVENESS. 

If the user wants to analyze why certain results are obtained, 
he may display the terms associated with the descriptors in his re- 
quest as they appear in the term association data file he has chosen. 
This may be done with the command: 

DISPLAY ALL ASSOCIATION DATA 

If the user wishes to display just those terms associated with 
EFFECTIVENESS, he could issue the command: 

DISPLAY TERMS ASSOCIATED WITH 'EFFECTIVENESS ' 

Other commands enable one to display associated terms to any 
desired level, e.g., the most highly associated term only or the 
two or three most highly associated terms. If the user wishes to 
modify the search in such a way that, for example, only the two 
most highly associated terms for each request term participate in 
the search, then he may enter the command: 

MODIFY TO USE TWO MOST HIGHLY ASSOCIATED TERMS 

As another example, if the user wanted to expand the search in 
the normal way according to a specified word association file with 
the exception that no descriptor associated with RETRIEVAL should 
be considered unless its coefficient of association is greater than 
.8, he could use this command: 

SEARCH USING ONLY THE TERMS RELATED TO 'RETRIEVAL' THAT ARE #GT# 

.8000 

Unlike the other two search programs, LABSRC 3 does not auto- 
matically display the accession number of each document satisfying 
the user's request as soon as the document is examined. Instead it 
counts the number of documents that satisfy the question and reports 
this number to the user. The purpose of this is to provide the user 
with control over the volume of the output. The user may then call 
for the display of all of them or only some of them as he desires. 
When associative retrieval mode is used, LABSRC 3 will, if the user 
wishes, sort the retrieved documents into either ascending or de- 
scending order according to the values of the computed relevance 
numbers. This feature is optional rather than fixed because there 
are certain experiments (such as comparative studies) in which a 
student may want the listing in accession number order. 

Another powerful feature of LABSRC 3 is the parser used to 
process the Boolean expressions that constitute search requests. 

The parser produces directly executable code; there is no interme- 
diate form generated. The logic of the original question is em- 
bodied in this code. There is no need to invoke an interpretive 
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technique each time a new document is examined. This results in 
very swift searching of the document colle ,ion. This is quite 
important in a man-machine dialogue where system effectiveness de 
pends upon good system response time. 

A sample run using LABSRC 3 is shown on the following page. 
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FIG 6 : SAMPLE RUN FROM LABSRC 3 



TERMINAL CLEAR 
labsrc3 

:.t you?, service 



Q01 - 00 YOU WANT WORD ASSOCIATION? 
yes 

Q02 - PLEASE SPECIFY ASSOCIATION FILE 
kuhnsw 

Q03 - DO YOU WANT SCORING? 
yas 

Q04 - ENTER BOOLEAN EXPRESSION 

('acquisition' or 'circulation' or ' cataloging' ) 3 
and 'library' and ('computer' or ' automation ' 3 
or 'mechanization') and not 'indexing* 

FILE IS NOW BEING SEARCHED 

007 DOCUMENTS SATISFY EXPRESSION 

QOS - DO YOU WANT RESULTS PRINTED 
yes 

A0022 .302 

AO 029 .302 

A0030 .737 

A0051 .302 

A0 1 35 .857 

B0 054 .591 

B04G3 .878 

Q0G - SPECIFY RESTART OR EXIT 
search using no association 

FILE, IS NOW BEING SEARCHED 

002 DOCUMENTS SATISFY EXPRESSION 



Q05 - DO YOU WANT RESULTS PRINTED 
yes 

A0 135 

B04C8 



Q0G - SPECIFY RESTART OR EXIT 
exit 

TERMINAL CLEAR 
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6. MACHINE TUTORIAL MODE 



6.1 Introduction and Summary 

6.1.1 Initial Remarks 

As explained in section It. 3 we decided to parallel the develop- 
ment of tS on-line study of intellectual access witninvestxgatxon 

Ss "S= ^ - “Sw- 

Moreover, work in this area could fittingly deal with segmen 
instruction in traditional librarianship , thus serving not only 
demonstrate the medium hut to put it to immediate use as well. 

The term "CAI" has become the generally accepted designation for 

computer-"assisted" or "administered" or "augmented" in . s ^ u =*’l“ a \ he 
all levels and in a variety of basic types . In order to identify the 
kind of cll^e had in mind, we adopted the phrase "Machine Tutorial 
Mode (MTM)". This term is intended tc connote fairly ree an 
ible dialogue between a teacher and an adult student , as compared 
with modes^etter suited to elementary teaching to computational pro- 
blem-solving, and to the imparting of simple skills. 

One of our first concerns was the selection of the subject matter 
for which MTM would be a suitable vehicle. The highes P r i° r ^^ 
"applied librarianship" which relate directly to 
lectual access are cataloging, classification, index! ng , 

^ Tvi ‘bliocr anhv From among "these we chose "bo dea . 

logins! spfci?^ally alphabetical subject cataloging as practiced in 

U.S. libraries. 

We were aware that the then existing techniques in CAI had not 
yet been extended to apply to graduate instruction except: ^ a ™^n 

buToS^ 

sible, certain constraints which might tote. “^"^LTo^have'been 

appr opriate^to^r e strict* student s ' to rigidly formatted responses espe- 
cially since some of the concepts they would be dealing with would 
admi t a variety of interpretations. 

6.1.2 Why Subject Cataloging? 

Subject cataloging was selected as the first course to be imple- 
mented in MTM because: 

1) It related directly to the Laboratory's ongoing inquiry into 
nature of intellectual access. 



2) It could be presented without resorting to graphics (dia- 
grams, photographs, etc.,) which were outside the range of 
the initial hardware configuration of the Laboratory. 

3) Subject cataloging deals with an entity with which almost 
everyone - as a library user - has had some experience. 

H) It has general applicability in schools offering the Master’s 
Degree in Library Science. 

5) It is closely related to other subjects in the traditional 
curriculum, such as descriptive cataloging and classifica- 
tion. 

6) It is a laboratory -type course and its implementation could 
possibly reduce the student-hour workload to a degree not 
attainable with a non-lab course (e.g., university library 
administration, or history of libraries.) 

Also subject cataloging posed special teaching problems which we 
wanted to confront. Far from being a set of rote procedures, as some- 
times imagined by the beginner, it involves interpretation of princi- 
ples and policies which are often divergent or seem to be. In the 
teaching of subject cataloging it is frequently necessary, for example, 
to distinguish between terms which represent existing practice based 
on historical precedent and those terms which for the moment apj)ear to 
be more rational. (That is, a difficult point to put across is that 
consistency is important to the user. Thus the fact that an otherwise 
rational term is not within existing practice tends to deprive it of 
some of its rationality.) 

We postulated that if MTM succeeded - even partially - in the 
area of subject cataloging, it could be expected to succeed in others. 

6.1.3 Machine Tutorial Mode; Potentials, Constraints and Background 

Potentials and Constraints . We were not at all sure that CAI 
techniques could be molded to the needs of graduate students,, because 
of the difficulty of attaining meaningful interaction with the stu- 
dents at the desired intellectual levels. However, the potential of 
the approach is such that we felt it to be well worth the attempt . 

The potentials of the machine tutorial mode include those gen- 
erally attributed to "high-level" CAI: 

I) The ability to engage the student privately in an active con- 
versational interchange composed of elements that can be pre- 
dicted. Exposition is accompanied by frequent questions and 
other opportunities for the student to express himself, even 
though his answers may not in every case affect the unfold- 
ing of the presentation. 
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2) The ability to branch, i.e., to vary the presentation accord- 
ing to the machine’s interpretation of student responses. 
Program reactions can be varied to suit categories of respon- 
ses - commending those which contain desired elements and 
supplying corrective instruction in reply to those which con- 
tain elements of error. 

3) The ability to record student performance in detail, for both 
individuals and groups. This ability lends itself not only 
to evaluation of student performance but to evaluation and 
refinement of the course itself. 

k) The ability to administer instruction with patience and equa- 
nimity well beyond typical norms. 

5) The ability to administer problem-solving exercises in a real 
time sense. In a course involving laboratory assignments, 
the system is able to provide immediate commentary on work 
submitted. 

Counterposed against the above abilities are the constraints of the 
computer, chiefly its inability to cope with human dialogue for which 
it has not been amply programmed in advance. It cannot follow the 
convolutions, inversions, and implied relationships of ordinary con- 
versation. In order to compensate for this inability - that is, to 
provide a modicum of flexibility in dealing with student input - a 
considerable programming effort is required. 

B ackground . In trying to find models adaptable to our work, we 
discovered that, although many researchers in various parts of the 
country were exploring forms and applications of CAI systems, their 
efforts in this broad new field were extremely diverse. At the time 
work on the Information Processing Laboratory was begun, very little 
had been done to compare and combine their findings. There was also 
an almost universal tendency on the part of those active in CAI devel- 
opment to deal with readily definable course content in order to be 
able to concentrate on the behavior of the medium vis-a-vis the stu- 
dent. In contrast, the model we hoped to develop for the Information 
Processing Laboratory had to give equal attention to corpus and to the 
mode of transfer. 
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6.2 Policy Decisions 



6.2.1 Course Boundaries 



The first important policy decision was that of delimiting the 
scope of the course. Library schools do not give equal emphasis to 
subject cataloging, nor do they all introduce it in the same sequence 
as does the School of Librarianship at Berkeley. However, since our 
intention was to design a directly implementable segment for incorpo- 
ration in an on-going curriculum, we based out approach on the Berke- 
ley model. Briefly, this encompasses: 



1) 



In the first quarter, an historical review of cataloging in 
general, consideration of subject cataloging according to 
LC practice and the Sears abridgement of LC practice, LC 
classification, Dewey Decimal classification, and a survey 
of other classification systems; 



2) In the second quarter, descriptive cataloging according to 
the Anglo-American code , and filing rules ; 



3) In the third quarter, cataloging of special collections, 
plus consideration of classified systems. 



In order to extract and deal with subject cataloging as a distinct 
element yet retain its function as an organic unit within the curric- 
ulum, it was necessary to prescribe its interfaces (particularly with 
descriptive cataloging and classification) with great care. We 
assumed that it would be administered at the very outset of the stu- 
dent’s year, with little or no preparation other than general back- 
ground information. 



It was decided very early to make no attempt to link subject cat- 
aloging with classification, for the reason that except on theoretical 
grounds neither contributes markedly to the understanding of the other. 



A brief discussion of main entries and added entries was included 
in order to give the student an insight into the overall catalog struc- 
ture, even though the practice at Berkeley is to defer this material 
until dealing with descriptive cataloging in the second quarter. 



6.2.2 Sub j ect Authority 



Frobably no two specialists in the field of subject cataloging 
would agree on the ideal corpus for a text on the subject , or would 
agree on matters of relative emphasis, even if they could agree on 
corpus. This is to be expected, because subject cataloging is itself 
an imperfect methodology, prone to a variety of interpretations. 



We decided to base the course largely on the interpretation and 
exposition of Library of Congress practice provided by the late David 
Judson Haykin in his Subject Headings, a Practical Guide , GPO, 1951* 
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Haykin offers reasonable statements in support of this practice with- 
out straying into apologetics for past inconsistencies in its applica- 
tion to the LC catalog itself, and without bemusing his readers with 
controversial theoretics. 

This basic material is supplemented by comparisons between the LC 
and Sears lists of subject headings. In addition, numerous other 
sources, such as Needham, Coates, and Mann, were consulted in order to 
derive the benefit of their particular approaches to the subject cata- 
loging problem. 

6 . 2.3 Choice of CAI Programming Language 

Some CAI programming is done using assembly level computer langu- 
ages. But for the most part so-called M CAI languages" are developed 
using existing higher level languages such as FORTRAN or PL/l. PILOT 
(Programmed Instruction Learning Or Teaching), written in a "selected 
character string matching" (i.e., keyword matching) language or PL/l, 
is an example. This new and powerful high-level CAI language was under 
intensive development at the University of California Medical Center, 

San Francisco, at the time we commenced work on MTM. After reviewing 
other high level languages such as COURSE-WRITER, PLANIT, LYRIC, MENTOR, 
and CAL(UCI), we decided that PILOT, as designed, would provide the 
best combination of features desired for the MTM program. The fact 
that it was developmental during most of the period covered by this 
report has afforded an opportunity for us to participate in refining 
some of its specifications through the continual process of reporting, 
back our operational experience in working with it. (See R. Karpinski, 
et al. "PILOT; a Conversational Language - User's Guide." Office of 
Information Systems , University of California Medical Center , San 
Francisco, California. December, 1968.) 



- 59 - 



o 

ERIC 



Li 






6 . 3 Current Status 

6.3.1 Components of the MTM Program 

The MTM program within the Information Processing Laboratory 
contains the following three components : 

1) a machine-administered course in subject cataloging; 

2) a machine-administered laboratory in subject cataloging 
to accompany the above course; 

3) an empirical methodology for preparing MTM courses in 
other subjects. 

The first item is complete except for final editing and re- 
finement of scoring mechanisms. The second item is half completed: 
the overall design and organization and the specification of 
standard subroutines has been accomplished while implementation of 
unique book problems is one-third complete. Once this is finished, 
the laboratory supplement must be studied and revised in an 
operational context. The third item has been formulated and defined. 

Each of the three components is discussed separately in the 
following pages. 

6.3.2 Equipment 

The MTM program uses two mechanical terminals located in the 
Information Processing Laboratory: an IBM 27^+1 and a Teletype 

Model 35, both driven by an IBM 360/50 computer with a 256K memory 
located at the University of California Medical Center in San 
Francisco. Linkage is through acoustic couplers and commercial 
grade telephone lines. Other system hardware is described in 
Appendix 3. 

From the beginning, we have regarded these mechanical terminals 
as an interim installation only, since we envisaged the ultimate 
installation as consisting of cathode-ray tube consoles which are 
preferred for their high display rates and quiet operation. Since 
it has been necessary to present text in fairly large blocks in 
order to deal with the conceptual material in the subject cataloging 
course, it has become even more evident that CRTs have a commanding 
advantage over mechanical terminals. A three-console CRT system is 
slated for installation in January, 1969. 
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6.1+ MTM Course in Subject Cataloging (201x) 

6.4.1 Presentation 

The material is presented in 253 frames of varying length and 
intricacy, joined by logical connectives - cause/effect, elaboration, 
example, contrast, exceptions, ramifications, etc. - intended to 
lead the student’s attention continually onward toward the end of 
the course. Some of these connectives are stipulated in the text, 
while others are readily inferred. An example of a connective 
stipulated in the text is the following statement and question: 
"Subdivided headings received a great deal of attention in Sears' 
Suggestions for the Beginner . Do you remember the four basic types 
we discussed earlier?" 

Each fr ame begins with a statement, followed by a question or 
requirement, followed by the student response (or, input), followed 
by the program reaction. If the student answered correctly, the 
program will pass the student onto another frame. If the student 
has answered incorrectly the program will give the student some 
cues and then go back to student input mode. 

In order to accommodate student responses which the course 
author has failed to anticipate or has deemed too remote or too 
general to include in his list of answers wanted or not wanted, a 
’ carry— all response’ is contained in each frame, xt is worded in 
such a way as not to be non-sequitur (we hope) and to return the 
student back to the question by strengthening the basic cue. This 
carry-all response might be simply "I don't understand you. Please 
reword your answer.” Or it might say "Try again, remembering what 
we said about (the subject being discussed)," A frame usu- 

ally contains one or more prompting cues for the student who gives 
wrong answers j if he fails the second or third time, the piogram 
gives the correct answer and the student is passed onto another 
frame. Appendix 8 contains a sample program-student interaction 
sequence . 

The course unfolds in five broad movements, as follows: 

1. First, the student is confronted with the problem of 
indexing theory* i.e., how does one represent a collection of 
documents in such a way that a literate person may approach it 
via many different terms? 

The first third of the course is taken up with this problem 
and measures designed to alleviate it. 
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2. Next, some of the technical problems associated with the 
above measures are discussed, such as the need to limit certain 
subject blocks by judicious scattering. 

3. This leads to careful consideration of the idea of sub- 
division as a control device. The four types of subdivision 
(chronological, geographical, topical and form) are discussed, and 
students are made aware that their misuse leads to class if i cat ory 
treatment inconsistent with the idea of dictionary arrangement. 

Up to this point in the course, LC practice has been fre- 
quently cited, and the LC list has been offered as a kind of bell- 
wether to the flock. Now a process of reviewing and extending 
some of the earlier instruction is initiated, in order to reinforce 
and enrich the student's understanding. This is done through 
comparison of the Sears approach with the fuller LC approach to 
various problems. 

5. Finally, special situations are touched upon, such as the 
effect of ethnic factors on formulation of "literary" headings. 

This concludes the course. 



6.4.2 Teaching Strategy and Tactics 

The elements of strategy and tactics we considered in the 
course are the following: l) style of presentation; 2) latitude 

of student responses; 3) continuity (with respect to provisions 
for reviewing material and for signing in and signing off); 4) stu- 
dent aids. Each is discussed separately below. 



1. Style of Presentation . The linearity of the machine -tutor- 
ial mode calls for a strong narrative line of presentation in order 
to capture and retain the student's attention from beginning to end. 
At the same time, however, one wants the student to have a feeling 
of real participation in his own instruction. Since the course is 
to be given to a highly diverse set of students, we followed a 
median path between 'guidance' and thought-provoking questions. 



Elements of the ' conversational mode ' are used in the course 
to give the student the feeling of individual tutoring and to 
lessen the leeling that he is conversing with a computer rather 
than a human. We believe that textual statements should be models 
of clarity and precision and should err, if at all, on the side of 
saying too little rather than saying too much. Then, if the stu- 
dent fails to grasp the idea of the statement, the fact will be 
apparent in the ensuing interchange , and the program will supply 
correction and clarification as necessary. This kind of presenta- 
tion contrasts sharply with the more diffuse style of a typical 
classroom lecture which must provide universal correction# elab- 
oration, and reinforcement as it goes along. 
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The student is usually aware from the start that the computer 
cannot converse with him in human fashion. He should be brought 
to appreciate, however, that a human teacher is nevertheless trying 
to communicate with him personally through the computer medium, and 
that the progress of the interchange is dynamic and exceedingly var- 
iable. (Over 15 million variant paths through 201x are possible, 
excluding review sequences.) He should be informed, moreover, that 
many of his responses will be saved and later studied by the human 
teacher in order to refine and extend the program’s interactive 
capability. 

To illustrate, here is part of the introduction to the course: 

YOU ARE NOW ENROLLED IN MTM COURSE 201x: "SUBJECT 

CATALOGING."' 

MY NAME IS MR. MEREDITH, AND I AM USING THE COMPUTER 
IN THE SAME WAY YOU ARE: AS A MEANS OF COMMUNICATION, 
ALMOST LIKE A TELEPHONE (ALBEIT A VERY COMPLICATED 
ONE . ) I HOPE YOU WILL EXCUSE INSTANCES WHEREIN MY 
SIDE OF THE CONVERSATION STRIKES YOU AS SOMEWHAT 
CURT OR UNFAIR. THIS MAY OCCUR WHEN I FAIL TO 
ANTICIPATE A PARTICULAR RESPONSE FROM YOU. FOR ONE 
THING, IT IS ALMOST IMPOSSIBLE FOR THE PROGRAM TO 
COPE WITH DOUBLE NEGATIVES. 

THE BACKSPACE OPERATES AS AN ERASER. IF YOU MAKE 
A TYPOGRAPHICAL ERROR IN ANSWERING, JUST BACKSPACE 
TO THE PART YOU WANT TO CHANGE, AND THEN RETYPE ON 
THE SAME LINE, OR, IF YOU PREFER, ROLL THE PAPER 
MANUALLY ONE LINE BEFORE STARTING TO RETYPE. YOU 
MAY ALSO ERASE A WHOLE LINE BY TOUCHING THE "ATTEN- 
TION" KEY. 

SOME STUDENTS ARE A LITTLE NERVOUS ABOUT WORKING 
WITH A TERMINAL FOR THE FIRST TIME. WOULD YOU 
LIKE SOME PRACTICE BEFORE WE ENTER THE COURSE? 



The conversational tone is heightened by using the first- 
person singular instead of the first-person plural; the student 
should never be given the feeling that he is dealing with a 
committee. Calling the student by name, or referring to something 
he said previously and relating it to the corrective or reinforcing 
matter to be conveyed are both good devices. The use of a homely 
phrase now and then does no harm, and students are delighted to 
discover occasional touches of humor in the programmed responses. 
Examples should be colorful as well as utilitarian. 

Flat rejection of a student's response is avoided; instead, 
he should if possible be told wherein he has erred. The responses 
designed to handle unanticipated student entries seek to be 
equally tactful, for all too often some of these unanticipated 
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responses prove to be quite reasonable! Sometimes it is well to 
a dmi t something which both students and author already know: I'd 

sorry, I just can’t seem to understand you. Please re-word your 
response. " 

Variety is another important element in every segment of 
the course. Variations keep the student alert, as long as they 
are not too obtrusive. Each new frame appears as something new, 
calling for a new and different type of student input. Factors 
of textual length, complexity, format, and difficulty of response 
afford great leeway in this respect . 

2. Latitude of Student Responses . Ten different types. of. 
student response are entertained in course 201x, in the following 

proportions : 



Multiple choice 
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The preponderance of multiple-choice and fill-the -blank questions 
apparent from the list was unplanned and probably reflects the 
greater ease with which assessment of responses to these types 
can be programmed. 

The question-answer form of MTM differs markedly from that 
of a quiz or a formal interview because student input does not 
connote finality . Not every question can be answered by reference 
to preceding statements , and at times the student is asked questions 
which can be answered only after a certain amount of discussion 
or which require him to draw on personal resources quite outside 
the progr amme d instruction. The counter— reply then leads him to 
a correct formulation. The object, of course, is to make the 
student think rather than to make him strain for clues . The 
idea that one can discuss certain points without prejudice should 
in the long run prove very attractive to graduate students. In a 
subsequent version of the course it may be possible to provide 
branching routines tailored to the oackground and individual 
aims of students better than is now the case. This type of 
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refinement is envisioned as taking place over a long period of 
time and will entail detailed analysis of operating records in 
conjunction with student profiles . 

Some CAI experts have advocated permitting students to indi- 
cate that they know correct answers simply by pressing a key which 
tells the computer, in effect, to proceed to the next frame. That 
is, since CAI supposedly allows students to proceed at their own 
pace, they should not be required to input answers they know to 
be correct. Although this would be suitable for some types of 
instruction, the need for discussion and reinforcement did not 
recommend its use in course 201x. 

Another interesting proposal is that the speed with which 
computer responses to student input are delivered should be 
delayed to resemble human response time, in order to provide a 
more comfortable rhythm in the interactive process. We believe 
that such a device is not needed in this particular course because 
only the yes/no responses seem instantaneous. By the time a stu- 
dent has read a content-bearing programmed response, the impression 
of computer speed should have dwindled. The suggestion merits 
further investigation, however. 

3. Continuity . Giving the student the means to review ma- 
terial previously covered provides a sense of continuity to the 
course. The subject cataloging course contains two types of re- 
view: voluntary review which is invoked by the student and 

involuntary review which is automatically invoked by the computer. 
Involuntary review is used with great caution for if given overtly 
the student may feel that he is being treated unfairly and if given 
covertly he will probably recognize the review material and think 
the computer has gone awry. 

Voluntary review, on the other hand, gives the student a 
heightened sense of participation in his own instruction. He may 
think, "What if I had answered 'x' instead of *y' on ‘that tough 
question awhile back? I’d like to try it again." An added advan- 
tage is gained through displaying a list of available review topics: 
such a list reassures the student that he has not, in fact, missed 
any of the main points of the discussion (if this indeed is the 
case) and helps him formulate his own concept of how the material 
is logically organized. This last, of course, is very important 
to the memory process as we understand it. 

Unfortunately, invitations to review break the thread of dis- 
course. We judged it best to limit them in number and to plant 
them at points where interruption would be least damaging. 

Of greater concern are the breaks caused by human fatigue and 
operational time limits. It is impossible to restore the student’s 
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stream of attention simply by flipping a switch when he logs back 
in at the terminal. For this reason it is highly desirable that 
the system's "restart facility" should resume not at the point of 
previous sign-off but at an earlier point which will replay the 
last three or four frames of the preceding session. The task of 
programming this feature is substantial, but it is well worthwhile 
for the sake of continuity. Normally the student will not have 
kept notes, and the availability of hard-copy printout of previous 
sessions is problematical. 

An equally important question concerns sign-off arr angements . 

We are not sure of the optimum duration of a console session, for 
even a single student. With some students, we expect that learning 
efficiency (the instructional transfer rate) will decline markedly 
after 25-30 minutes at the console. Others might be able to con- 
tinue at a high level twice that long, depending not only on indi 
vidual receptivity and stamina but also on where they happen to be 
in the course itself, parts of which are more difficult than ochers. 

To set the limit of terminal sessions according to some admin- 
istratively convenient standard would be to relinquish part of the 
tutorial advantage. The student is a good judge. of his own condi- 
tion, and we feel that he should be allowed to sign off whenever 
he is ready for closure . But what of the student who is either 
too proud or too awed by the system to sign off when he really, 
should? In such a case the program itself should make the decision. 
It can do this by first establishing a norm based on performance 
during time O'ro-SO 111 , then comparing current performance in ten- 
minute blocks against this norm. When the transfer rate declines 
to about 75$ of the norm, a flag is set which causes the program 
to switch to automatic signoff at the next designated closure 
point. Certainly automatic signoff should be accompanied by a 
short statement explaining the action. (Note: System features 

permitting the above arrangements have not yet been implemented 

in PILOT. ) 

k. Student Aids . Students are not encouraged to resort to 
the LC List or any other source to help them formulate answers. 

We try to tell them enough so that they can draw logical conclu- 
sions and enter into intelligent discussions without this sort of 
side activity, which can become fairly expensive in terms of ter- 
minal time . 



6.H.3 Scoring, Measurement, and Control 

Introduction . The ability to evaluate individual student 
performance currently and cumulatively is often advanced as giving 
CAI an overwhelming advantage over traditional means of instruction. 
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evaluation or scoring can be used as the determinant of a student s 
path through a course. The student's achievement can be expressed 
in many ways: the number of times the program in interaction with 

, the student has achieved desired match, undesired match, or no 
match at all, separately organized according to the following four 

categories : 

1, Sequential Blocks - the student's experience with a 
single frame, a series of frames, a terminal session, 
two or more sessions, the entire course. 

2. Topical Blocks - the student's experience with a 
particular concept, a facet of that concept, or 
a combination of related concepts. 

3, Tactical Blocks - the student's experience with 
cQP’ta.in types of presentation or forms of cju.es— 

tion. 

4. Condition I ndicators - reflections of the stu- 
dent's apparent efficiency, perceptiveness, or 
attitude . 

Hone of these is exclusive of the others as there is nothing to 
prevent a measure from being used again and again for different 
analytical purposes . 

Fig. T gives sample tallies which illustrate the machine poten- 
tials of scoring and measurement. The ’’subjects" refer to topical 
areas within subject cataloging and to areas of student performance, 
e.g. specificity, direct access, use of natural language, use of 
key ter ms , synonyms , etc . The first type of tally shows the stu- . 
dent's level of performance throughout the course and also shows his 
overall performance in a given subject area. The second type of 
tally shows student performance in various (but not all possible; 
types of response. 

Another possible tally is to determine the variance in stu- 
dent perfor man ce at different times during the terminal session. . 

Or, one can tally the areas in which the student asked for a review 
and determine whether he or she did better in that .subject after 
the review. If the tally shows that the student did not ask for 
any reviews, this may indicate that he is overconfident. 
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Fig. 7 : SAMPLE MACHINE MEASUREMENTS 
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valu e of Analysis . Initially, the analysis of student records 
tells us much more about the course itself than it does about the 
students. Until editing and re-editing on the basis of operational 
experience has been accomplished and the course has thereby attained 
stability, a student* s record can reflect his personal achievemen 
only in very general terms. It can tell us if he managed to get 
through the course at all, the ideas he seemed to have greatest 
difficulty with, and (by means of the above-mentioned condition 
indicators) whether his input has been willfully contrary, mischie- 
vous, or downright stupid - in which case we would consider it 
invalid for course evaluative purposes. 



One should think of scoring as part of a larger system of 
measurement and control. In most CAI languages, such a system 
can make real time determinations governing program execution, 
i.e., the unfolding of pre-planned statements in a conditional 
sequence which depends on the matching of student input with a 
snecified string of characters, or on the values stored in afile 
of variables, or on the two in combination. The file of variables 
is very useful. Singly or in combination they can be used to 
override explicit branching, but this is only part of their utility. 
They can be referred to at any time and their contents altered, 
transferred, combined, or displayed according to the wishes ° f 
the program designer - for the purpose of furnishing to the human 
teacher (author, proctor, teaching assistant) a concise record of 
the student 's achievement. 



When the system permits automatic recording of elapsed time, 
the record is amplified and individual and mean rates of progress 
can be established. Conceivably through comparison with a current 
rate, these rates could be used dynamically to influence program 
execution. However, we expect their greatest value will lie m 
their contribution to the analysis of student records. 

201 x Measurements and Controls . The measurements and controls 
of course 201x go somewhat beyond the bounds of necessity because 
of its innovative nature and the fact that it has been implemented 
in a language in which investigation of new types of control had 
research value. The measurement and control structure of the 
course includes provision for the following. 

a. Typical sign-in and registration procedure. 

b. An optional console-familiarization routine. 

c- Numerical variables governing mechanistic 
intra- frame operations (these are either 
cleared or initialized after each frame). 



d. Numerical variables for storing student's 
scores as they accumulate. 
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There are 35 categories of scoring topics, as listed in Pig. 8 
(which also shows their distribution throughout the course). (All 
scores are negative or "bad-tallies," these being much easier to 
plan and manipulate than pre-set "good-tallies" subject to reduc- 
tion.) In order to prevent excessive downgrading in a frame con- 
taining numerous undesired-match or no match possibilities, an 
arbitrary maxim um score, based on content, was set for each frame. 
This requires temporary storage of bad-tallies until the student 
'.has completed the frame . 

e. Two voluntary review points, listing a total of 15 topics 
from which the student recursively selects those he wishes to 
review. When review is requested, all existing bad-tallies in the 
review block are transferred to new addresses . 

f. Three involuntary reviews (overt) invoked when certain 
negative scores reach a set value. 

g. Scattered commendatory statements based on performance in 
a series of frames (e.g. s , "I see that you made only two mistakes 
in the preceding drill section. Good for you!" In this way the 
variable does double duty - it triggers the statement which in 
turn quotes it.) 

h. When requested by the teaching assistant or proctor, the 
201x program outputs the student record in detail: 

1) The last numbered frame completed. 

2) All negative scores, listed according to the 
various scoring categories, for which see Pig. 8. 

3) A list of review topics invoked by the student. 

4) The negative scores attained on review. 

5) (if a certain condition indicator is present): 
a statement to the effect that the student seems 
to be somewhat overconfident . 

6) (if a certain variable exceeds a set value): 
a statement to the effect that the student seems 
to be sabotaging the instruction. 

7) A summary of negative scores combined in five 
groups (roughly coinciding with the five main 
"movements" comprising the instruction). 

8) A net positive score is computed by subtracting 
the sum of the preceding item 7 from 1000. 

Although the student may assume that his progress is being 
recorded, there is no point in advertising to him the range and 
intricacy of the process. This would be almost as unnerving for 
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the student as a session with a human tutor in which the latter 
took copious notes, used a tape recorder, and kept clattering away 
at an adding machine during the interchange. 

Problems in Development . All this machinery takes a great 
deal of time to put together and get into working order. It exacts 
a price in computer core memory. It represents an intellectual 
investment that must be patched and shored up each time a frame is 
revised, and accordingly tends to ossify the course. In other 
words, the system of measurement and control can become so intricate 
that it overshadows the subject matter of the course itself. (The 
situation is alleviated somewhat, fortunately, whenever a control 
or measurement function is taken over by the CAI language system 
in use. ) 

Our experience with 201x indicates that the course author 
should not allow himself to become so intrigued with the various 
control and measurement devices that the vitality of the instruc- 
tion suffers. (This kind of situation can be forestalled in the 
design phase by specifying only those measurement functions which 
will contribute either to the instructional plan or to necessary 
’ evaluations . ) 
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FIG. 8: DISTRIBUTION OF SCORIN 
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6.5 MTM Laboratory Supplement in Subject Cataloging (201xL) 

6 . 5 • 1 Introduction 

A laboratory conducted in conjunction with a course in 
cataloging is typically structured around an increasingly complex 
collection of books to which students are required to assign 
subject headings. The tasks students are required to perform 
correspond roughly to current lecture material. If the MTM 
course in subject cataloging is adopted in the Berkeley M.L.S. 
program, this would require extensive realignment of the laboratory 
supplement which accompanies the present subject cataloging 
course. We felt that since the laboratory part of the course 
would have to be revised anyway, it would be appropriate to do so 
in the same mode as the main course, i.e., the MTM mode. 

The most striking advantage to be gained from an MTM laboratory 
supplement is the same as that for an MTM basic course, i.e., the 
computer’s ability to respond immediately to input from a student 
terminal. In a conventional laboratory environment, the student 
obtains no final evaluation of his work until after an assignment 
has been handed in, manually annotated and graded by a laboratory 
assistant, and returned. The delays inherent in such a procedure 
tend to blur the student's original reasons for making certain 
choices . 

The computer's ability to respond immediately to student 
input promises a better situation wherein students can receive 
immediate feedback on their submissions, can quickly revise 
these, find resubmit them until they achieve the desired result. 

They can carry on a discussion of various choices without the 
feeling of finality which builds up around a written laboratory 
assignment . 

6.5.2 Structure and Scope 

As designed, the laboratory supplement will consist of seven 
units aimed at giving the student experience in cataloging books 
of gradually increasing variety and complexity. The books 
chosen for processing were grouped as follows: 

La,b Unit 1: Books lending themselves to simple, straight- 
forward subject headings, illustrating: a) the 

virtues of simplicity and directness; b) the 
use of natural language; c) the decision process 
in choice of inclusive headings vs. multiple 
headings; and d) the use of "see" and "see also" 
references and their "x" and "xx" counterparts. 
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Lab Unit 2: Books illustrating: a) use of adjectival headings 

b) use of phrase headings; c) use of parenthetical 
qualifiers . 

Lab Unit 3-5: Books illustrating: a) use of form subdivision; 

b) use of geographical subdivision; c) use of 
chronological subdivision; d) use of topical 
subdivision. 



Lab Unit 6: Books illustrating: a) use of proper name 

headings: and b) headings appropriate for 
belles lettres. 



Lab Unit J : Books illustrating technical differences in use 

of the Sears List as compared with the LC List. 

Physical materials will consist of some 92 books selected 
and grouped to suit the above categories. This collection can 
be added to and modified at any time. From among the 12-18 books 
in a group, the student will choose 5 and will assign one or 
more subject headings for each, submitting each in turn to MTM 
for evaluation and comment. (Sometimes the comment will amount 
to something like "You have overlooked the fact that [state- 
ment] Please sign off, reconsider, and resubmit.") 



The 201x Lab supplement is expected to bulk considerably 
larger than the basic course in terms of storage requirement, as 
indicated in the following comparison of source decks: 



Course 201x 

Total 8000 cards 



Course 201xL 

Introduction 
Lab .1 
Lab. 2 
Lab . 3 
Lab . 4 
Lab . 5 
Lab . 6 
Lab .7 

Total 



250 cards 
2750 
2500 
2500* 

2500 * 

2500 * 

2500 * 

2500* 

18,000* (* = projected) 



The specifications for the subject cataloging lab supplement 
were formulated in mid-1968, after the profile of the basic 
course 201x had been well established. and after PILOT was selected 
as the language of implementation. 

6.5.3 Mode of Presentation 

The mode of presentation is quite different from that which 
has been provided in the basic course. Each book is dealt with 
as a separate entity, with no logical progression joining it to 
the others of its group. Everything pertinent which can be said 
about the book and the problem of assigning to it a subject 
heading must be available during execution of the routines designed 
around it. This represents a task of some magnitude even though 
common errors of construction, punctuation, hyphenation, etc., 
can be handled through standard callable subroutines valid for 
all cases. 

It might be possible to avoid some of this reiteration by 
prescribing one set of five books, in given order, for each 
laboratory unit. The reason for not doing so was that we felt 
students would work better independently of each other, free of 
pre-ordained sequence. The arrangement has the added advantages 
of reducing mutual interference between students and in giving 
the student a chance to go on and process more than the required 
number of books , if he wishes , either for added practice or in 
hope of clearing up some bothersome question. 

6.5*4 Conserving Terminal Time 

The use of reference books (the LC and Sears lists) in 
connection with MTM terminal operation raises the question of how 
to avoid wasting terminal time on-line while the student is 
engaged in considering the books themselves and deciding how 
he should treat them. The solution is to restrict on-line opera- 
tion to periods of active interchange. 

We have no data on which to base a reliable ratio, and the 
actual running time of lab units on-line will fluctuate radically 
with student achievement rates. We expect that it might be on 
the order of four minutes of study and decision to one minute of 
terminal time. For planning purposes, we consider that the 
typical student will spend an average of 12 minutes on-line per 
book, plus 36 minutes off-line, or 48 minutes per book and 4 
hours per unit, on and off-line. (These estimates are scaled 
to selectric terminal speeds . ) This means that three students 
should rotate on a single terminal, with an allowance of 20$ for 
lost motion. Extremely rapid sign-on and break routines and 
rapid file call-up present no great technical problems. It is 
expected that during a lab session a student should be able to 
restart by means of a single input. 



0 



-75- 



6 . 5*5 Linkage with Course 201x 

No cross -monitoring between 201x and 201xL is provided. It 
is clearly in the student * s interest not to undertake a lab unit 
before he has been adequately prepared for it. On the other hand, 
it would be a mistake to withhold instruction simply because a 
lab unit had not yet been accomplished. The degree of separation 
between the basic course and the laboratory supplement avoids 
imposing, on the operational mode of either, constraints which 
might be dictated by the other. It also permits them to be 
separately implemented, an action which might be occasioned by 
phasing-in-problems, equipment limitations, etc. 

However, there is coordination between the laboratory supple- 
ment and the basic course, and it is arranged in the following 
way: within 201x, the student will be M cleared" for Lab. 1 
after he has passed through Frame # N, for Lab. 2 after Frame # NN, 
for Lab. 3 after Frame # NNN, and so on. Instead of interrupting 
instruction to notify the student of ' clearance , the information 
is withheld until signoff. (Not yet implemented in PILOT). The 
student then or later signs on with a call for the lab unit for 
which he has been cleared or for any of the preceding labs. Or, 
he may wait until he has completed the basic course before starting 
any of the lab units. 



6 . 6 MTM Methodology 
6.6.1 Introduction 

The MTM course in subject cataloging and its laboratory- 
supplement constitute readily identifiable products of research 
and development on the Information Processing Laboratory. A 
less tangible result, but one -which could prove equally useful in 
another way, is the methodology which has emerged as a direct 
result of the work on the course itself. 

The term "methodology" may be somewhat ambitious in this 
instance because it has been tested on such a limited scale and 
is, in fact, still evolutionary. However, we feel that the rules 
and procedures developed to date are valid guidelines for creation 
of computer-assisted or computer-augmented courses of instruction 
involving: a) interactive student-computer communication; b) con- 

ceptual material and pedagogical strategies too complex for the 
program to be generated automatically or even semi-automatioally ; 
c) a high degree of academic responsibility at the graduate level; 
and d) use of a high-level CAI language. The tasks and subtasks 
which must be performed to produce an acceptable MTM unit are 
shown in the diagram below, and each task is discussed separately 
thereafter. 



FIG. 9: TASKS AND SUBTASKS OF AN MTM UNIT 
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6.6.2 System Planning 

This presupposes awareness of an academic goal and of the 
problems which beset its attainment. System planning entails 
recognizing a range of existing or potential solution-contributory 
elements and from these selecting a promising set which is then 
fused into a solution for coping with the problem. In the case 
of an MTM course, effective system planning requires a multi- 
disciplinary approach establishing l) what is to be taught; 2) the 
depth, intensity, and duration of the teaching; and 3) the means 
to be used. Either the education specialist or the computer 
specialist may take the initiative in this, but a joint effort 
will be called for. 

6.6.3 Subject Definition 

This consists of establishing the boundaries of the material 
to be transferred, the identification of each item within those 
boundaries, and a clear distinction between items to be mastered 
and items to which the student need only be exposed. The 
subject specialist must present the material in a readily under- 
standable form to the educational designer and the writer. The 
subject specialist cannot assume and should not even try to 
assume all of the other subtasks. The subject specialist is 
usually a member of the faculty, whose degree of authority depends 
in part on single-minded devotion to his specialty and for whom 
excursions into unrelated activities are sometimes wasteful. The 
subject specialist’s role is integral with his status as a member 
of the faculty, and he may be thought of as the executive of a 
curriculum development committee of the faculty. (An exception 
could be argued for the subject specialist whose specialty is 
tutorial psychology and who proposes to teach that subject by 
machine, but such an exception only confirms the general rule.) 

There is much communication between the subject specialist 
and the education specialist, of course, and between the subject 
specialist and the planning body (if there is one). But regard- 
less of how well defined or how diffuse this interface may be, the 
subject specialist should be held responsible only for producing 
and monitoring the substantive material around which a course of 
instruction is designed. 

6.6.4 Educational Design 

The educational designer, who is a professional educator, 
considers the nature of the subject material in the light of what 
is known or surmised of educational psychology, determines what 
portions of it may be amenable to MTM, outlines a sequential 
strategy for its presentation, and establishes ground rules 
governing format and interchange. It is this last item which is 
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the most difficult to prescribe in advance of the actual writing 
because there is little procedural guidance for working in tutorial 
mode either with the help of machines or without it. One suspects 
that the old-fashioned tutor was less of a conversational textbook 
than an ambulatory syllabus (Mark Hopkins and his log notwith- 
standing) . 

A technique of saying things and then posing questions about 
them is standard with educators. The result is hardly "conversa- 
tional” but it is instructive. The educator must decide how far 
to venture into more discursive interchange and where to insert 
lacunae which he hopes the student will be able to fill in either 
spontaneously or on cue. The educational design function also 
includes the making of decisions on how extensive and elaborate 
a control and measurement structure to incorporate in the MTM course. 

These and a host of technical questions bearing on the pro- 
posed course must at least be considered before the actual writing 
process is initiated, even though in many cases only tentative 
answers can be assigned. 

6.6.5 Authorship 

The technical educational writer operates within the substan- 
tive domain established in the subject definition phase, according 
to guidelines formulated in the educational design phase. His or 
her task consists of transmuting corpus into a stream of controlled 
statements and questions against which is matched an uncontrollable 
stream of student reply. The anticipation of variance in the 
latter is the most brain-racking task of the author. He hopes 
to make the dialog meaningful, and to do so he must deal with the . 
meaning of as many different student replies as possible. The 
rest he can only acknowledge and try to pull together with carry- 
all responses. 

We advise the author to rough out a basic textual line, using 
the perfect student as his "straight man," in order to discern 
the pattern of frames and families of frames into which the 
material seems to fall. He can then go back and start covering 
contingencies. During the course of this operation he can use 
his own informal notation to indicate text, question, anticipated 
answer and corresponding response, and the stepped carry-all 
responses for use against unanticipated answers. It is not 
necessary that this notation coincide with existing CAI language 
operation codes or even that each item be in workable sequence 
according to a particular language. Authorship is complicated 
enough without injecting mechanistic rules into the creative 
process . 

We further advise the author to adopt a consistent style of 
address suitable for dialog with an intellectual equal, avoiding 
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sterile phrases, stuffiness, and dogmatic assertions of authority, 
yet retaining sufficient didactic control to convince the student 
that he knows what he is talking about. He should strive to endow 
the course with the color of a single personality, regardless of 
how many people have participated in its formulation. In this 
sense the authorship function includes responsibility for a 
factor which can make or break the final result: the general one 

of the course. 



6.6.6 Coding 



Coding should be carried on separately from the process of 
authorship, even if one is dealing with a so-called user-oriented 
language. Such languages do indeed permit people with little or 
no knowledge of programming to formulate acceptable machine input. 
The general characteristics of the language may indeed be reaculy 
explainable, the opcodes mnemonically apt, the reserved symbols 
few and the fields free. But until one has used the language 
for an extended period of time, an£ coding requirement seems to 
get in the way of authorship, and conversely the logic and accuracy 
required to do a good job of coding suffer in the presence of the 
creative muse. Coding may be done by research assistants or by 
coders, (in situations involving uncomplicated exchanges and 
standard routines, the foregoing strictures can be relaxed for 
experienced author/ccders , but even in. such cases we have not 
noticed any particular advantage in doing so.) 



A form similar to that furnished in Fig. 10 is recommended 
for both authorship and coding. If the author’s text is liberally 
spaced, the necessary coding can often be accomplished directly 
thereon. Various command and control statements are entered in 
the right hand column opposite the text strings to which they 
apply. The material is then ready to be punched, the key-punch 
operator proceeding straight across the page, line by line. This 
form has proven most satisfactory among several. we have tried, 
and it has been used exclusively for the past eight months. 



For some writing and encoding operations (such as that of the 
laboratory routines in 201xL) flowcharting is very helpful. 
charts are also an excellent means of guarding against oversight, 
even though they require extra time to prepare and check. 



6 . 6.7 Testing and Debugging 

This requires a great deal of time and effort , in spite of 
the careful coding and keypunching. This is especially true if 
the system does not provide on-line edit capability, because a 
single flaw can knock out one or more frames which must then await 
recompilation before they can be further tested. The recommended 
procedure under these circumstances is to go through the entire 
course marking up the formatted listing (if a formatted listing 
is available) from the daily terminal printout, with a separate 
list of line numbers affected. A keypuncher then follows along , 
revising the cards as necessary. Finally the whole deck is 
recompiled and the process begins over again. Everyone on the 
project should participate in the testing and debugging. 

Two different kinds of debugging should be recognized: l) me- 

chanistic debugging, which is concerned with discrepancies which 
contaminate or spoil the running of the program, and 2) pedagogical 
debugging, which is concerned with conversational aberrations, 
non-sequiturs , etc., which in the main represent oversights of 
authorship and educational design. Simple pedagogical debugging 
can be carried on concurrently with mechanistic debugging, and in 
practice the two are not distinguished. 

6.6.8 Revision 

This is usually referred back to the author, educator or sub- 
ject specialist. It is occasioned by patently justifiable criticism 
of the content or of the form, sequence, or tone of presentation. 

The participation of unbiased volunteers is valuable in detecting 
the necessity for this, but their comments must be carefully weighed 
in the context of their expertise and the effects of their usual 
random entry into the course. 

The decision to revise even a single frame cannot be taken 
lightly because almost invariably such revision will have reper- 
cussions in other frames. The advantage of a slight improvement 
in one statement may be lost if it results in loss of tactical 
effectiveness of associated statements. Review sequences may be 
affected as well, and numerous adjustments in the measurement 
and control apparatus may be required. This is unfortunate , but 
it seems to be a characteristic of CAI that it is so highly 
integrated and it represents such a large investment by the time 
a co ur se is compiled and tested that revision becomes very costly 
and time-consuming. We feel that correct scheduling of sub-tasks 
can help reduce the need for deep revision. Also the danger of 
embarking on excessive revision because of isolated criticisms 
ean be reduced by maintaining a log of test runs in which the 
proto-students record their comment. These log entries can then 
be combined and analyzed systematically. 
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6.6.9 Evaluation, Adoption, and Implementation 

We have not yet examined these functions with sufficient 
thoroughness to warrant making any recommendations. These functions 
fall into the domain of the systems planner, the subject specialist 
and the educational specialist, as well as others outside the 
range of immediate discussion. 

6.6.10 Machine Technology 

This is not precisely a function (as are systems planning, 
educational design, etc.), but it is rather a contributory element 
affecting all of the functions or subtasks involved in creating 
an MTM course. Machine technology tends to prescribe the shape 
of the course and the operational conditions under which it will 
be utilized but it should be fairly obvious that the tendency 
should be resisted when it threatens course objectives (assuming 
that the original concept is sound) . 

Machine technology is an ever-present factor for everyone 
involved in formulating an MTM course, with the clear exception 
of the subject specialist whom we have defined elsewhere as a 
person who should be solely and exclusively responsible for 
corpus. It follows that acquaintance with machine techrology is 
indispensable to anyone performing functions of system planning, 
educational design, authorship, encoding, debugging, or revision. 

6.6.11 Overlapping Functions 

In the functional scheme advanced above, any phase of activity 
may be participated in by more than one person, and one person 
may take part in two or more phases. A likely instance would be 
for the educational specialist to chair a system planning committee, 
to assume responsibility for educational design, and to do some 
of the course writing. He might have a co-author who would also 
perform part or all of the encoding. A keypuncher might learn 
enough about coding to participate in this function, as well as 
in testing, debugging, and revision. 
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APPENDIX 1 



CHRONOLOGICAL REVIEW OF 



MAJOR DEVELOPMENTS DURING PHASE 



1 . 



CHRONOLOGICAL REVIEW OF MAJOR DEVELOPMENTS DURING PHASE I. 

11557 



June 15 


Start of project. 


June-Sept . 


Basic planning of the Laboratory. 


June -Nov. 


Evaluation of central computing facilities and nego- 
tiation with Berkeley campus Computer Center. 


Sept . 


Selection of fields for initial development; start 
of MTM course in subject cataloging; IBM-2740 remote 
terminal delivered. 


Oct . 


Start of development of ILR monitor program. 


Nov. 


Decision made to use 360/40 as central processor; 
start of work on associative retrieval; interim 
draft of MTM course in subject cataloging completed. 



1968 



Jan. 


Authority list formed for indexing document file; 
program written to convert author names to canoni- 
cal form; ILR monitor went into operation serving 
one terminal; final draft of subject cataloging 
course completed. 


Feb. 


Information science documents indexed. 


March 


Creation of first word association file; LABSRC1 
put into operation. 


April 


IBM-2314 storage device installed. 


May 


Machine debugging of subject cataloging course 
began; additional word association files created. 



June 

July- 

August 
Sept . 

Oct . 
Nov. 



Work on cataloging lab supplement course began; 
ILR monitor program now able to serve multiple 
terminals simultaneously. 



Acoustic coupler obtained; start of daily MTM de- 
bugging (remotely: Berkeley-San Francisco). 



Hypothetical system performance study completed. 



LABSRC2 became operational, LABSRC3 became partially 
operational; acquired IBM-27^-1 remote terminal and a 
second acoustic coupler. 



Machine debugging of basic cataloging course com- 
pleted. (First version, of course.) 

Abstracts obtained for all documents in information 
science master file. Decision made to purchase 
Sanders 720 CRT units plus related hardware. Work 
started on developing a suitable MTM language to be 
used on Berkeley campus . 
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PHYSICAL ARRANGEMENT OF THE INFORMATION PROCESSING LABORATORY 



Location . The Information Processing Laboratory is located 
centrally within the School of Librarianship , on the 4th floor of 
the Doe Library Building of the University. It is expected that 
expansion through an increase in the number of student terminals 
will be accommodated in an adjacent room of the same dimensions. 

Equipment . The equipment shown in the diagram following this 
page is identified as follows: 



TTY 


- Teletype Model 35-ASR with punched tape 
output attachment. 


IBM 2741 


- IBM 274l Selectric Terminal. 



Acoustic Couplers - Anderson Jacobsen Acoustic Coupler 

Model ADC 260 . 



CRTs 


- Sanders 720 remote terminal stations 
consisting of: Sanders 708H Terminals, 

with 722A-3 Keyboards and 7284 modifi- 
cation for 84-character line . One unit 
is equipped with Sanders Photo Pen and 
Amplifier 7220-1. 


Control Unit 


- Sanders 701, with 7215A Synchronous I/O, 
1705A Memory (3), 7221-3 Peripheral Con- 
trol Module,, and 706-2 Basic Hard Copy 
Adapter . 


Data Set 


- (for TTY) Western Electric 10 3F.* 


Data Set 


- (for CRT Control Unit)* General Electric 
TDM-220 D20 Modem, (functionally equiva- 
lent to Western Electric 201B1 modem) 



*(Plus one each located at the other end of the transmission lines. 
The mechanical terminals use a commercial voice-grade Schedule 4 2- 
wire line. The CRT terminals use a 4-wire Schedule 4 voice-grade line 
The central computer is about 1 cable-mile from the laboratory room. ) 

0 / 9 1 - 



Serial Data Com- - Sanders 73l/l (located at the Computer 
munications Buffer Center), serving the CRT system in the 

(not shown) manner of an IBM 2701. Mechanical ter- 

minals are served by an IBM 2701 at the 
central computer. 



Computer Link-up . Communication with the Berkeley campus Com- 
puting Center will be entirely via fixed circuit and data sets. The 
acoustic couplers are used for tying the teletype and the selectric 
terminal with off-campus computers such as the IBM 360/50, located 
at the University of California Medical Center, San Francisco. 
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INITIAL CONSIDERATIONS 

Central Computer. At the time this project started, the 
Computer Center on the Berkeley campus had three distinct computing 
systems: l) a directly-coupled IBM 7040-7094 system that had car- 

ried the brunt of the computing load for several years; 2) a 
CDC-6400 which had been installed shortly before for the purpose 
of taking most of the load off the 7040-7094 system; 3) the IBM- 
360 /4o which was used primarily to support operations on the other 
systems. No major computer user on the campus used the 360 as a 
primary computer. The 7040-7094 system was scheduled to be re- 
moved (and indeed was removed) in early 1968. Therefore, we had 
to decide between the CDC-6400 and the IBM-360/40 as to which 
machine would become the central computer for the Laboratory. 



There were several factors to consider in making this choice: 
speed, memory size, size and type of auxiliary storage devices, 
machine organization, availability, supporting software, cost, 
and suitability of supporting a network of remote terminals. 

After careful consideration of these factors, we chose the 360/40. 
There were two factors in favor of the 360/40 that influenced us 
strongly. The first was that the 360/40, being used by the campus 
Computer Center basically as a secondary machine, offered the 
greater promise of being available for long periods of time each 
day. This was of prime importance to the Laboratory as its facil- 
ities must be available to users many hours each day. 



The second major factor in favor of the 360/40 was its internal 
organization. This machine devotes eight binary bits to the rep- 
resentation of each character. This means that the 36o/kO may 
distinguish internally between 256 unique characters. The CDC-6400, 
on the other hand, is so organized that just six bits are devoted 
to each character. Thus, it is able to represent only 64 unique 
characters internally. This difference is important to the 
Information Processing Laboratory . In handling library data one 
must often deal with characters that are not present in the Roman 
alphabet. Also, it is desirable to be able to process other non- 
standard characters such as diacritical marks . We felt that t e 
64-character limitation of the CDC-6400 would restrict the flex- 
ibility of the Laboratory unduly. 



Remote Facilities . With respect to remote terminal equip- 
ment ,”both - mechan^ (remote typewriters) and cathode 

ray tube (CRT) equipment were considered. A vital aspect of any 
terminal network is the communication facilities that link the 
network to the central computer. With mechanical terminals, one 
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may use a dedicated communication line in conjunction with data 
sets, or one may transmit over normal telephone lines via a "dial- 
up" operation using acoustic couplers. When using CRT equipment 
a private leased line communication link is needed to achieve best 
performance by the remote terminals . 

At the beginning of this project there was already on hand a 
Teletype (Model 35) unit. Shortly thereafter, an IBM-2740 type- 
writer terminal was obtained. These units were purchased by the 
School of Librarianship . We decided that these mechanical terminals, 
though comparatively slow in displaying output, would be adequate to 
meet the goals of Phase I. It did not seem appropriate to plan to 
install CRT terminals until development was well along in several 
areas. Since we would be using the on-campus 360/40, we chose to 
dedicate two local telephone lines to serving these terminals rather 
than acquiring acoustic couplers and using a "dial-up" procedure. 

CURRENT FACILITIES 

Central Facility . Much of our development work has been done 
using the IBM-36o/40 at the Berkeley campus Computer Center. This 
is the machine that will support the Laboratory once it becomes 
operational. It has a 128-K memory, four 7-track magnetic tape 
drives, two card readers, one card punch, an 1100-line /minute 
line printer, and an operator’s 1052 typewriter. Early in Phase I 
this 36O/J1O system had four 2311 disc storage units with a combined 
capacity of 29 million characters . As mentioned earlier this ma- 
chine is run under control of the IBM Operating System. Two of 
the 2311 disc storage devices were devoted to the exclusive use 
of OS itself. In April, 1968, a large 2314 disc storage device 
was installed. This unit has a capacity of over 200 million 
characters. Until October l4, 1968, the entire storage capacity 
of the 2314 was dedicated to the exclusive use of the Institute 
of Library Research. Since that date we now share the 2314 with 
other campus users, ILR retaining exclusive use cf half its stor- 
age capacity. The four 2311 disc units were removed in mid-Octo- 
ber, 1968. The area of the Laboratory project where this large 
auxiliary storage unit will be most needed is that of processing 
large files. The Laboratory would be quite restricted if we were 
limited to the 2311 devices . 

As discussed in section 6, our MTM development work has been 
carried on using the facilities of the U.C. Medical Center in 
San Francisco. The computer we use there is an IBM-360/50 with 
a 256-K memory and the usual complement of peripheral equipment, 
plus an IBM-2314 storage device. 

Remote Terminals . In 1967, two mechanical terminal devices 
were purchased with funds provided by the U.C. School of Librar- 
ianship. One of these is an IBM-2740 typewriter, the other a 
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Teletype Model 35* The 2740 is linked to the 360/40 via an 
A.T.&T. Model 103-F data set and a dedicated voice-grade tele- 
phone line. Similar communication equipment allows us to link 
the Teletype Terminal to the 360/40 as well. At the 360, data 
sets and a 2701 data adapter unit provide the required interface. 

In the early development stages of the MTM programs, project 
staff travelled to the U.C. Medical Center to carry on their 
machine work. However, in July, 1968 we obtained an Anderson- 
Jacobson Model 260 acoustic coupler that we now use in a "dial-up 
procedure to link the Teletype to the 360/50 in San Francisco. 
Effort on the MTM work increased to the point that in early 
September, 1968 we obtained an IBM-2741 terminal and a second 
acoustic coupler. We now routinely have two remote terminals 
being used in developing MTM programs communicating with San Fran 
cisco simultaneously. 
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APPENDIX Ha: SUBJECT AUTHORITY LIST 



INFORMATION PROCESSING LABORATORY PROJECT 

JANUARY 31* 1968 
REVISED DECEMBER 16* 1968 



ABBREVIATIONS 



S 

SA 

SN 

* 



SEE 

SEE ALSO 

IN THE SFNSE OF (I.E. SCOPE NOTE) 

NO DOCUMENTS YET INDEXED WITH THIS TERM 
TERM NOT ALLOWED* RELATED TERM TO BE USED 



♦ABBREVIATION 

ABSTRACT 

ABSTRACTING 

ACCESS 

ACCESSION NUMBER 
ACCURACY 
ACQUISITION 
ADDRESS 

ADMINISTRATION 

ALGEBRA 

■♦•ALGOL 

S PROG. LANGUAGE 
ALGORITHM 
ALPHABETIC 
ALPHABETIC ORDER 
ALPHANUMERIC 
♦ ALTERNATIVES 
AMBIGUITY 
ANALOGY 
ANALYSIS 
ANSWER 
♦ANTHOLOGY 

SA BIBLIOGRAPHY 
APPLICATION 
♦ARITHMETIC 

S MATHEMATICS 

ARRAY 
•f ARTICLE 

S OOCUMENT 
ARTIFICIAL INTEL 
ASSIGNED 
ASSOCIATION 
ASSOCIATIVE 



+ ATTRIBUTE 

S CHARACTERISTIC 
AUTHOR 

AUTHORITY LIST 

SA THESAURUS 
AUTO ABSTRACTING 
AUTO. INDEXING 
AUTOMATIC 
AUTOMATION 

SA MECHANIZATION 



BATCH PROCESSING 

BIBLIOGRAPHIC 

BIBLIOGRAPHY 

SA ANTHOLOGY 
BINARY 
BOOK 
BOOLEAN 

SA LOGICAL 



CALL NUMBER 
CANONICAL 

SA NORMALIZED 

CARD 

CARO CATALOG 

CATALOG 

CATALOGING 

CATEGORIES 

CENTERS 

CENTRALIZED 

CHARACTERISTIC 






CHEMICAL 
CIRCULATION 
Cl TATI ON 
CITATION INDEX 
♦CLAIM 

SA COPYRIGHT 
SA PATENT 
CLASSIF. SCHEME 
CLASSIFICATION 
CLERICAL 
+CLUE WORD 

S KEYWORD 

CLUMP 

CLUSTER 

CO-OCCURRENCE 

+COBOL 

S PROG. LANGUAGE 

CODE 

SN MEDIA DESIGNATION 
COOING 

SN COMPUTER CODING 
COEF FECI ENT 
COLLECTION 
♦COLLOOUIUM 

SA CONFERENCE 
SA MEETING 
SA SYMPOSIUM 
COMBINATIONS 
+COMIT 

S PROG. LANGUAGE 
COMMUNICATION 
COMP LINGUISTICS 
COMPARISON 
COMPUTER 
CONCEPT 
CONCOR CANCE 
CONDITIONAL PROB 
CONFERENCE 

SA COLLOQUIUM 
SA MEETING 
SA SYMPOSIUM 
CONNECTION 
♦CONSECUTIVE 

S ORDER 
•♦CONSOLE 

S RFMOTE TERMINAL 
CONTENT 

CONTENT ANALYSIS 

CONTEXT 

CONTROL 

CONTROLLED 

CONVENTIONAL 

CONVERSION 

COORDINATE 

COORDINATE INDEX 

SA UNITERM SYSTEM 



$ COPYRIGHT 

SA CLAIM 
SA PATENT 

♦CORE 

S STORAGE 
CORRELATION 
COST 
COUNT 
COUPLING 
CRANFIELD 
CRITERIA 
CRITICAL 

SN REVIEWING? NOT VITAL 
CROSS REFERENCE 
CURRENT AWARENES 
CURRICULUM 
♦CUSTOMER 

S USER 



DATA 

♦DECENTRALIZATION 
DECISION THEORY 
DEDUCTIVE 
DEGREE 

DEPTH OF INDEX IN 

DESCRIPTIVE 

DESCRIPTOR 

SA KEYWORD 
SA TAG 
SA TERM 
DESIGN 

SA PLANNING 
DICT IONARY 
♦DIFFERENCE 

S COMPARISON 
♦DIGITAL COMPUTER 
S COMPUTER 
DISCRIMINANT 
♦DISPLAY 

S REMOTE TERMINAL 
DISSEMINATION 
♦DISSERTATION 
DOCUMENT 

SA JOURNAL 
DOCUMENTATION 
DUAL DICTIONARY 



♦ECONOMICS 

S COST 
EDITING 
EDUCATION 
EFFECTIVENESS 

SA EFFICIENCY 
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EFFICIENCY 


♦GROUP 


SA EFFECTIVENESS 


S CLUMP 


FLECTRONIC COMPUTER 
S COMPUTER 


FMPIRICAL 


HARDWARE 


S EXPERIMENT 


SN COMPUTERS* MICROFILM 


ENCODING 


EQUIPMENT, FTC. 


S CODING 


SA MECHANICAL 


FNTROPY 


♦HEADINGS 


ENTRY 


S SUBJECT HEADING 


SN ACCESS POINT 


HIERARCHY 


FRROR 


HISTORICAL 


EVALUATION 


♦HUMAN 


SA TEST 


S MANUAL 


SA UTILITY 


♦HUMAN INCEXING 


SA VALUE 


S MANUAL INDEXING 


EXPERIMENT 
FXTR ACT 


FACFT 


♦IDENTICAL 

IDENTIFICATION 

ILLUSTRATION 


FACFTED CLASSIF. 


♦implementation 


FACT RETRIEVAL 


INDEPENDENT 


FACTOR ANALYSIS 


INDEX 


S STAT. METHOD 


INDEXING 


FALSE DROP 


INFERENCE 


FFFOBACK 


INFO. RETRIEVAL 


FILE 


TNFO. SCIENCE 


SA LIST 


INFORMATION 


SA STRING 


INPUT 


FILE ORGANIZATIO 


♦INQUIRER 


FLOW OF INFO. 


S USER 


FORMAT 


♦INQUIRY 


FORTRAN 


S QUESTION 


S PROG. LANGUAGE 


♦INSTRUCTION 


FREQUENCY 


S EDUCATION 


FUNCTION 


INTELLECTUAL 


SN OPFRATICNAL, NGT 


INTERDISCIPLINAR 


MATHMATICAL 


INTERFACE 


GFNERAL 


INTERPRET 

♦INTERROGATE 

S QUESTION 


GENERATION 


♦INTERSECTION 


SN PRODUCTION 


S VENN DIAGRAM 


GFNERIC 


INTRODUCTORY 


GOAL 


INTUITIVE 


S OBJECTIVE 


INVENTORY 
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GR AMMAR 
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JOURNAL 

SA DOCUMENT 



KEYPUNCH 

KEYWORD 

SA descriptor 

SA TAG 
SA TERM 

KWIC 



LANGUAGE 

LARGE 

LATTICE 

LAW 

+LEVEL 

S DEGREE 
+LEXICAL 

S ALPHABETIC 
♦LEXICON 

S DICTIONARY 
LIBRARIAN 
LIBRARY 
L INGIJI STIC 
LINK 
LIST 

SA FILE 
SA STRING 
LITERA TURE 
LOGIC 
LOGICAL 

SA BOOLEAN 



+MACHINE 

S HARDWARE 
MACHINE-READABLE 
♦MAGNETIC TAPE 

S STORAGE 
MAN-MACHINE 
MANUAL 

MANUAL INDEXING 
MATCH 

MATHEMATICAL 

MATHEMATICS 

SA PROBABILITY 
MATRIX 
MEANING 
MEASURE 
MECHANICAL 

SA HARDWARE 
MECHANIZATION 

SA AUTOMATION 
MFDIUM 



MEETING 

SA COLLOQUIUM 
SA CONFERENCE 
SA SYMROSIUM 
♦MEMORY 

S STORAGE 
METHODOLOGY 
♦METRIC 

S MEASURE 
MICROFICHE 
MICROFILM 
MODEL 

SA SIMULATION 
MODIFICATION 
MULTIPLE 



NATIONAL 

NATURAL 

NATURAL LANGUAGE 

NEEOS 

NETWORK 

SN ORGANIZATIONAL STRUCTURF 
SA ORGANIZATION 

NOISE 

♦NOMENCLATURE 

S NOTATION 
NON-CONVENT I ON AL 
NON— DI SCRIMI NANT 
NON-FILE 
NON-RANDOM 
NON-RELEVENT 
♦NORMAL IZEO 

SA CANONICAL 
NOTATION 

SA TERMINOLOGY 
NUMBER 
NUMERIC 



OBJECTIVE 

SN GOAL* NOT AS OPPOSED 
TO SUBJECTIVE 
♦OCCURRENCE 
OFF-LINE 
ON-LINE 
OPERATION 
OPTIMIZATION 
ORDER 

ORGANI ZATTON 

SA NETWORK 
OUTPUT 



+ PAIR 
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S WORD ASSOCIATION 



♦PAPER 


PUNCTUATION 


S DOCUMENT 


♦PURPOSE 


PAP AMETER 


S OBJECTIVE 


SA VARIABLE 

PARSE 


PATENT 


QUALITATIVE 


SA CLAIM 


SA SUBJECTIVE 


SA COPYRIGHT 


QUANTITATIVE 


PATTERN 


♦QUERY 


PERFORMANCE 


S QUESTION 


♦PERIODIC/ L 


QUESTION 


S JOURNAL 


SN BOTH NCUN AND VERB 


PPRMUTEO 


QUESTION-ANSWER 


PERTINENT 

SA RELEVANT 


PHILOSOPHY 


RANDOM 


SA POLICY 


RANDOM-ACCESS 


♦PHOTO 


RANK 


S GRAPHICS 


READING 


PLANNING 


REAL-TIME 


SA DFSIGN 


RECALL 


♦ PLOT 


RECOGNIT ION 


S GRAPH 


RECORD 


♦POLICY 


♦RECORDED INFO. 


SA PHILOSOPHY 


S RECORD 


♦POPULATION 


RECURSIVE 


S COLLECTION 


SA ITERATIVE 


PRECISION 


REDUNDANCY 


PREDICTION 


REFERENCE 


♦PRINCIPLE 


♦ REJECT ION 


♦ PRINT-OUT 


RELATED 


S OUTPUT 


RELATIONSHIP 


PRINTING 


RELATIVE 


♦PRIVACY 


RELEVANCE 


S SECRECY 


RELEVANT 


PROBABILITY 


SA PERTINENT 


SA MATHEMATICS 


♦REMOTE TELETYPES 


PROCEDURE 


S REMOTE TERMINAL 


PROCEEDINGS 


REMOTE TERMINAL 


PROCESSING 


SA VISUAL DIS. CON. 


PROFILE 


♦REPORT 


PROG. LANGUAGE 


S DOCUMENT 


PROGRAM 


♦REQUEST 


SN COMPUTER PROGRAM 


S QUESTION 


SA ROUTINE 


RESEARCH 


SA SOFTWARE 


♦RESPONSE 


SA SUBROUTINE 


S ANSWER 


PROGRAMMED 


RESPONSE TIME 


♦PROPERTY 


RETRIEVAL 


S CHARACTERISTIC 


RETRIEVAL SYSTEM 


PSYCHOLOGY 


REVIEW 


^♦PUBLICATION 


SA SUMMARY 


' * S DOCUMENT 


SA SURVEY 


PUNCHED 


ROLE 



+PUNCHED-CARD 

S STORAGE 



ROUT INE 


STRING 




SN COMPUTER ROUTINE 


S A 


FILE 


SA PROGRAM 


SA 


LIST 


SA SOFTWARE 


STRUCTURE 


SA SUBROUTINE 


SUBJECT 




RULF 


SUBJECT 


HEADING 




SUBJECT 


INDEXING 




SUBJECT- 


CATALOG. 


SAMPLE 


4- SUBJECTIVE 


SCANNING 


SA 


QUALITATIVE 


SCIENTIFIC 


SUBROUTI 


NE 


SCOPF NOTE 


SA 


PROGRAM 


SEARCH CRITERIA 


SA 


ROUTINE 


SEARCH STRATEGY 


SA 


SOFTWARE 


SEARCHING 


SUMMARY 




SECRECY 


SA 


REVIEW 


SEE ALSO 


SA 


SURVEY 


SN AS USED IN CATALOGING 


SURVEY 




SEE-REFERENCE 


SA 


REVIEW 


SELECT ION 


SA 


SUMMARY 


SELFCTIVE DtSSEM 


SYMBOL 




SEMANTIC 


SYMBOLIC 


LOGIC 


SA SYNTAX 


SYMPOSIUM 


SEQUENCE 


SA 


COLLOQUIUM 


•SERIAL 


SA 


CONFERENCE 


S JOURNAL 


SA 


MEETING 


SERVICE 


SYNONYM 




SET THEORY 


SYNTACTIC ANAL. 


SETS 


SYNTAX 




SHELFLIST 


SA 


SEMANTIC 


SIGNIFICANCE 


SYSTEM 




SIMULATION 






SA MODEL 






SIZE 


TABLE 




SMALL 


SA 


GRAPH 


SOCIAL IMPLIC. 


TAG 




SOFTWARE 


SA 


DESCRIPTOR 


SA PROGRAM 


SA 


KEYWORD 


SA ROUTINE 


SA 


TERM 


SA SUBROUTINE 


4-TAPE 




SORTING 


S 


STORAGE 


SOURCE 


+ TEACHING 


SPFCIALI ZED 


S 


EDUCATION 



SPECIFIC ITY 
STANDARDIZAT ICN 
STAT ASSOCIATION 
STAT. ANALYSIS 

SA STAT, METHOD 
STAT, METHOD 

SA STAT. ANALYSIS 
ST ATE— OF— THE-ART 
STATISTICAL 
4- ST OCHAST IC 

S RANDOM 
STORAGE 



TECHNICAL 
TECHNICAL REPORT 
TECHNOLOGY 
TELEGRAPHIC ABS. 

TFRM 

SA DESCRIPTOR 
SA KEYWORD 
SA TAG 
+TFRMINAL 

S REMOTE TERMINAL 
TERMINOLOGY 

SA NOTATION 



o 
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TEST 




WEIGHT INDEXING 


SA 


EVALUATION 


WORD 


SA 


UTILITY 


WORD ASSOCIATION 


SA 


VALUE 


WORD FREQUENCY 


TEXT 




♦WORD PAIRS 


THEORY 




S WORD ASSOCIATION 


THFSAURUS 




SA 


AUTHORITY LIST 





TIME 

TIME-SHARING 

TITLF 

♦TOPIC 



S SUBJECT 
TRANSFORMATION 
TRANSLATION 
♦TR AN $L ITERATION 
TRANSMISSION 
TREF 

TREE STRUCTURE 
TRUNCATION 
♦ TYPE STYLE 
TYPF— SETT ING 
♦TYPOGRAPHICAL 



♦UNION 

SN SET THEORY UNION 
S VENN DIAGRAM 
♦UNION CATALOG 
♦UNITERM 

S DESCRIPTOR 
UNITERM SYSTEM 

SA COORDINATE INDEX 
UPDATING 
USER 
UTILITY 

SA EVALUATION 
SA TEST 
SA VALUE 



validation 

VALUE 

SA EVAUATION 
SA TEST 
SA UTILITY 
VARIABLE 

SA PARAMETER 
VECTOR 

VENN DIAGRAM 
♦VISUAL DIS. CCN. 

SA REMOTE TERMINAL 
VOCABULARY 




WFIGHT 
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APPENDIX 4b: INDEX TERM LIST SORTED ON FREQUENCY OF ASSIGNMENT 



INDEX TERM NO. OF RFFS. 



INFO. RETRIEVAL 84 
SYSTEM 84 
DOCUMENT 78 
COMPUTER 69 
STORAGE 69 
INDEXING 64 
RETRIEVAL 63 
INFORMATION 59 
SEARCHING 58 
ANALYSIS 53 
CLASSIFICATION 52 
STRIJCTUPE 52 
INDEX 49 
RELEVANCE 49 
LANGUAGE 46 
EVALUATION 44 
EXPERIMENT 44 
ASSOCIATION 42 
SEMANTIC 41 
MATRIX 39 
NATURAL LANGUAGE 38 
WORD 36 
FREQUENCY 35 
DESCRIPTOR 34 
QUESTION 33 
DICTIONARY 32 
PROGRAM 32 
USER 32 
DATA 31 
MEASURE 31 
TRANSLATION 31 
LIBRARY 30 
RELATIONSHIP 30 
THESAURUS 30 
HIERARCHY 29 
ALGORITHM 28 
AUTOMATIC 28 
COMMUNICATION 28 
INPUT 28 
LINGUISTIC 28 
STATISTICAL 28 
SYNTAX 28 
PROBABILITY 27 
GRAMMAR 26 
OUTPUT 26 
QUEST ION- ANSWER 26 
REFERENCE 26 
WORD ASSOCIATION 25 
LITERATURE 24 
FILF 22 
LOGIC 2? 
MATCH 22 
PROCESSING 22 
RELEVANT 22 



SEARCH STRATEGY 

SYMBOL 

TECHNICAL 

AUTO* INDEXING 

BIBLIOGRAPHIC 

SCIENTIFIC 

STAT. METHOD 

CONCEPT 

EFFICIENCY 

RECALL 

TEXT 

THEORY 

ABSTRACT 

CO-OCCURRENCE 

CODING 

KEYWORD 

TRANSFORMATION 

WEIGHT 

GRAPH 

VOCABUL ARY 

CLUMP 

HARDWARE 

MODEL 

SUBJECT 

SYNONYM 

SYNTACTIC ANAL, 
TREE 

COMPARISON 

COORDINATE INDEX 

CORRELATION 

MECHANIZATION 

TAG 

TEST 

ACCESS 

BIBLIOGRAPHY 
CLASSIF. SCHEME 
CONTENT 
COST 

EDUCATION 

LATTICE 

LINK 

MATHEMATICAL 
RETRIEVAL SYSTFM 
TITLE 

ASSOCIATIVE 

MEANING 

NETWORK 

RESEARCH 

SCANNING 

SFRVICE 

ABSTRACTING 

BOOLEAN 

CITATION INDFX 
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r 






CLUSTER 


13 


FUNCTION 


13 


LIBRARIAN 


13 


LIST 


13 


ORDER 


13 


PARSE 


13 


RANDOM 


13 


SEQUENCE 


13 


CITATION 


12 


MAN-MACHINE 


12 


PROG. LANGUAGE 


12 


SURVEY 


12 


VALUE 


12 


VARIABLE 


12 


AUTO ABSTRACTING 


11 


CODE 


11 


COEFFICIENT 


11 


COLLECTION 


11 


CONTEXT . 


11 


DISSEMINATION 


11 


METHODOLOGY 


11 


NOISE 


11 


PRECISION 


11 


PROCEDURE 


11 


SETS 


11 


SUBJECT HEADING 


11 


VECTOR 


11 


ADDRESS 


10 


AUTOMATION 


1C 


CHARACTERISTIC 


10 


CURRICULUM 


10 


DEGREE 


10 


DOCUMENTATION 


10 


ILLUSTRATION 


10 


MECHANICAL 


10 


NOTATION 


1C 


STAT ASSOCIATION 


10 


ALGEBRA 


9 


CATEGORIFS 


9 


COORDINATE 


9 


DESIGN 


9 


ERROR 


9 


GFNERAL 


9 


GENERIC 


9 


INFERENCE 


9 


INFO. SCIENCE 


9 


KWIC 


9 


NFFDS 


9 


PARAMETER 


9 


ROLF 


9 


RULE 


9 


STATF-OF-THE-ART 


9 


STRING 


9 


TREE STRUCTURE 


9 


BOOK 


8 


CARD 


8 



CATALOG 

CONTENT ANALYSIS 
CRITERIA 

DEPTH OF INDEXTN 
EDITING 
FALSE OROP 
FEEDBACK 

FILE ORGANIZATIO 

INTRODUCTORY 

JOURNAL 

MANUAL 

MATHEMATICS 

ORGANIZATION 

PROCEEDINGS 

ROUTINE 

SPECIFICITY 

TECHNOLOGY 

ANSWER 

BINARY 

CATALOGING 

CONNECTION 

COUPLING 

CROSS REFERENCE 

CURRENT AWARENES 

DESCRIPTIVE 

INTERPRET 

REMOTE TERMINAL 

ALPHABETIC 

AMBIGUITY 

COMP LINGUISTICS 

CONDITIONAL PROB 

CONFERENCE 

ENTRY 

LAW 

PATENT 

PERMUTED 

PREDICT ION 

RECOGNITION 

REDUNDANCY 

RELATIVE 

SIGNIFICANCE 

TIME-SHARING 

WORD FREQUENCY 

AUTHOR 

CANONICAL 

CENTERS 

CHEMICAL 

DISCRIMINANT 

FACET 

FLOW OF INFO. 
FORMAT 

INTERDI SC IPL INAR 

IRRELEVANT 

MEETING 

NATIONAL 

PATTERN 
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PERFORMANCE 


5 


ON-LINE 


oertinent 


«; 

jf 


PHILOSOPHY 


PUNCHED 


5 


PLANNING 


RANDOM- ACCESS 


5 


PROGRAMMED 


RECURSIVE 


5 


READING 


REVIEW 


5 


SPECIAL IZED 


SIMULATION 


5 


TELEGRAPHIC ABS. 


SORTING 


5 


TRANSMISSION 


SUBJECT INDEXING 


5 


TRUNCAT ION 


SYMPOSIUM 


5 


VENN DIAGRAM 


TABLE 


5 


ALPHANUMERIC 


TIME 


5 


ANALOGY 


UNITERM SYSTEM 


5 


ARTIFICIAL INTEL 


WEIGHT INDEXING 


5 


ASSIGNEO 


ACCESSION NUMBER 


4 


AUTHORITY LIST 


CIRCULATION 


4 


CARD CATALOG 


CRANFIELD 


4 


EFFECTIVENESS 


ENTROPY 


4 


FACT RETRIEVAL 


FACETED CLASSIF. 


4 


IDENTIF ICATION 


HISTORICAL 


4 


INTERFACE 


INTELLECTUAL 


4 


INTUITIVE 


LOGICAL 


4 


INVENTORY 


OPERATION 


4 


LARGE 


RECORD 


4 


MANUAL INDEXING 


SAMPLE 


4 


NATURAL 


SELECTION 


4 


NON-CONVENT IONAL 


SELECTIVE DISSFM 


4 


OBJECTIVE 


SOFTWARE 


4 


OFF-LINE 


SOURCE 


4 


PROFILE 


SUBJECT -CATALOG, 


4 


QUALITATIVE 


SYMBOLIC LOGIC 


4 


QUANTITATIVE 


UTILITY 


4 


RANK 


ACCURACY' 


3 


SCOPE NOTE 


ACQUISITION 


3 


SEARCH CRITERIA 


APPLICAT ION 


3 


SEE ALSO 


ARRAY 


3 


SET THEORY 


CENTRALIZED. 


3 


SIZE 


CLERICAL 


3 


SOCIAL IMPLIC. 


COMBINATIONS 


3 


SUBROUTINE 


CONCORDANCE 


3 


SUMMARY 


CONTROLLED 


3 


TYPE-SETTING 


CONVENTIONAL 


3 


VALIDATION 


CONVERSION 


3 


ADMINISTRATION 


COUNT 


3 


ALPHABETIC ORDER 


DECISION THEORY 


3 


BATCH PROCESSING 


DEDUCT IVF 


3 


CALL NUMBER 


EXTRACT 


3 


CONTROL 


GENERATION 


3 


CRITICAL 


KEYPUNCH 


3 


DUAL DICTIONARY 


MACHINE-READABLE 


■a 


GOVERNMENT 


MFDIUM 


3 


GRAPHICS 


MICROFICHE 


3 


INDEPENDENT 


MICROFILM 


3 


ITERATIVE 


MULTIPLE 


3 


MODIFICATION 


NON-RELEVANT 


3 


NON-DISCRIMINANT 


NUMERIC 


3 


NON-FILE 
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ERiC 



3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

1 

1 

1 

1 

X 

1 

l 

1 

i 

l 

t 

l 

l 

l 







NON-RANDOM 1 
NIJMBFR 1 
OPTIMIZATION 1 
PRINTING 1 
PSYCHOLOGY 

PUNCTUATION 1 
REAL-TIME 1 
RELATED 1 
RESPONSE TIME 1 
SEE— R FFER ENCF 1 
SHELFLIST 1 
SMALL 1 
STANDARDIZATION l 
STAT. ANALYSIS 1 
TECHNICAL REPORT I 



TERMINOLOGY 

UPDATING 

ABBREVIATION 

ALTERNATIVES 

ANTHOLOGY 

CLAIM 

COLLOQUIUM 

COPYRIGHT 

DECENTRALIZATION 

DISSERTATION 

IDENTICAL 

IMPLEMENTATION 

INVERTED 

NORMALIZED 

OCCURRENCE 

PP INCIPLF 

REJECTION 

SECRECY 

TRANSLITERATION 
TYPE STYLE 
TYPOGRAPHICAL 
UNION CATALOG 
VISUAL DIS. CON. 
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APPENDIX 4c : 

INDEX TERM NO. 


INDEX TERM LIST ALPHABETICALLY SORTED 

OF REFS. 

CLASSIFICATION 


52 


ABBREVIATION 


0 


CLERICAL 


3 


ABSTRACT 


19 


CLUMP 


17 


ABSTRACTING 


13 


CLUSTER 


13 


ACCESS 


15 


CO-OCCURR ENCE 


19 


ACCESSION NUMBER 


A 


CODE 


11 


ACCURACY 


3 


CODING 


19 


ACQUISITION 


3 


COEFFICIENT 


11 


ADDRESS 


10 


COLLECTION 


11 


ADMINISTRATION 


I 


COLLOQUIUM 


0 


ALGEBRA 


9 


COMBINATIONS 


3 


ALGORITHM 


28 


COMMUNICATION 


28 


ALPHABETIC 


6 


COMP LINGUISTICS 


6 


ALPHABETIC ORDER 


l 


COMPARISON 


16 


ALPHANUMERIC 


2 


COMPUTER 


69 


ALTERNATIVES 


0 


CONCEPT 


20 


AMBIGUITY 


6 


CONCORDANCE 


3 


ANALOGY 


2 


CONDITIONAL PROP 


6 


ANALYSIS 


53 


CONFERENCE 


6 


ANSWER 


7 


CONNECTION 


7 


ANTHOLOGY 


0 


CONTENT 


15 


APPLICATION 


3 


CONTENT ANALYSIS 


8 


ARRAY 


3 


CONTEXT 


11 


ARTIFICIAL INTEL 


2 


CONTROL 


1 


ASSIGNED 


2 


CONTROLLED 


3 


ASSOCIATION 


42 


CONVENTIONAL 


3 


ASSOCIATIVE 


14 


CONVERSION 


3 


AUTHOR 


5 


COORDINATE 


9 


AUTHORITY LIST 


2 


COORDINATE INDEX 


16 


AUTO ABSTRACTING 


11 


COPYRIGHT 


C 


AUTO. INDEXING 


21 


CORRELATION 


16 


AUTOMATIC 


28 


COST 


15 


AUTOMATION 


10 


COUNT 


3 


BATCH PROCESSING 


1 


COUPLING 


7 


BIBLIOGRAPHIC 


21 


CRANFIELD 


4 


BIBLIOGRAPHY 


15 


CRITERIA 


8 


BINARY 


7 


CRITICAL 


I 


BOOK ' 


8 


CROSS REFERENCE 


7 


BOOLEAN 


13 


CURRENT AWARENES 


7 


CALL NUMBER 


1 


CURRICULUM 


10 


CANONICAL 


5 


DATA 


31 


CARD 


8 


DECENTRALIZATION 


C 


CARD CATALOG 


2 


DECISION THEORY 


3 


CATALOG 


8 


DEDUCT IVF 


3 


CATALOGING 


7 


DEGREE 


10 


CATEGORIES 


9 


DEPTH OF INDEXIN 


8 


CENTFRS 


5 


DESCRIPTIVE 


7 


CFNTRALIZEO 


3 


DESCRIPTOR 


34 


CHARACTERISTIC 


10 


DESIGN 


9 


CHEMICAL 


5 


DICTIONARY 


32 


CIRCULATION 


4 


DISCRIMINANT 


5 


CITATION 


12 


DISSFMINAT ION 


11 


CITATION INDEX 
CLAIM 

CLASSIF. SCHEME 


13 

C 
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DISSERTATION 


0 



1 



'DOCUMENT 


78 


JOURNAL 


8 


DOCUMENTAT ION 


10 


KEYPUNCH 


3 


DUAL niCT IONARY 


1 


KEYWORD 


19 


FDITING 


9 


KWIC 


9 


EDUCATION 


15 


LANGUAGE 


46 


EFFECTIVENESS 


? 


LARGE 


2 


EFFICIENCY 


2C 


LATTICE 


15 


ENTROPY 


A 


LAW 


6 


ENTRY 


6 


LIBRARIAN 


13 


ERROR 


9 


LIBRARY 


30 


EVALUATION 


44 


LINGUISTIC 


28 


FXPFR IMENT 


44 


LINK 


15 


EXTRACT 


3 


LTST 


13 


FACET 


5 


LITERATURE 


24 


faceted classif. 


4 


LOGIC 


22 


FACT RETRIEVAL 


2 


LOGICAL 


4 


falsf drop 


8 


MACHINE-READABLE 


3 


FFFDRACK 


8 


MAN-MACHINE 


12 


FILE 


22 


MANUAL 


p 


FILE ORGANIZATIO 


p 


MANUAL INDEXING 


2 


FLOW OF INFO* 


5 


MATCH 


22 


format 


5 


MATHEMATICAL 


15 


FREQUENCY 


35 


MATHEMATICS 


8 


FUNCTION 


13 


MATRIX 


39 


GENERAL 


9 


MEANING 


14 


GENFRAT ION 


3 


MEASURE 


31 


GFNER IC 


9 


MECHANICAL 


1C 


GOVERNMENT 


1 


MECHANIZATION 


16 


grammar 


26 


MEDIUM 


3 


graph 


18 


MEETING 


5 


GR ARM ICS 


1 


METHODOLOGY 


11 


HAROWAR != 


17 


MICROFICHE 


3 


HIERARCHY 


29 


MICROFILM 


3 


HISTORICAL 


4 


MODEL 


17 


I DENT ICAL 


0 


MODIFICATION 


1 


I OFNT IF IC AT ION 


2 


MULTIPLE 


3 


ILLUSTRATION 


1C 


NATIONAL 


5 


implementation 


0 


NATURAL 


2 


IMDFPFNDFNT 


l 


NATURAL LANGUAGF 


38 


INDEX 


49 


NEEDS 


9 


INDEX TNG 


64 


NETWORK 


14 


l NFFPFNCF 


9 


NOISF 


11 


INFO* PFTRTEVAL 


84 


NON-CONVENT ION A L 


? 


INFO* SCIFMCE 


9 


NON-DISCRIMINANT 


1 


information 


59 


NON-FILE 


1 


INPUT 


28 


NON-RANOOM 


1 


INTELLECTUAL 


4 


NON-RELEVANT 


3 


INTERDISC IPLINAR 


5 


NORMALIZED 


0 


INTERFACE 


2 


NOTATION 


10 


intfrrr FT 


7 


NUMBER 


1 


INTRODUCTORY 


8 


NUMERIC 


3 


INTUITIVE 


2 


OBJECTIVE 


2 


inventory 


2 


OCCURRENCE 


C 


INVERTED 


0 


OFF-LINE 


? 


IPRPLFV ANT 


5 


ON-LINE 


3 


ITERATIVE 


1 


OPERATION 


4 
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OPTIMIZATION 

ORDFR 

ORGANIZATION 

OUTPUT 

PARAMETER 

PARSE 

PATENT 

PATTERN 

PERFORMANCE 

PERMUTED 

PERTINENT 

PHILOSOPHY 

PLANNING 

PRECISION 

PREDICTION 

PRINCIPLE 

PRINTING 

PROBABILITY 

PROCEDURE 

PROCEEDINGS 

PROCESSING 

PROFILE 

PROG# LANGUAGE 

PROGRAM 

PROGRAMMED 

PSYCHOLOGY 

PUNCHED 

PUNCTUATION 

QUALITATIVE 

QUANTITATIVE 

QUESTION 

QUESTION-ANSWER 

RANDOM 

RANDOM- ACCESS 

RANK 

READING 

RFAL-TIME 

RECALL 

RECOGNITION 

RECORD 

RECURSIVE 

REDUNDANCY 

REFERENCE 

REJECTION 

RFLATED 

RELATIONSHIP 

RFLATIVE 

RELEVANCE 

RFLFVANT 

REMOTE TERMINAL 

RFSFARCH 

RESPONSE TIME 

RETRIEVAL 

RETRIEVAL SYSTEM 

REVIEW 

ROLE 



1 

13 

8 

26 

9 

13 

6 

5 

5 

6 

5 
3 
3 

11 

6 
0 
1 

27 

11 

8 

22 

2 

12 

32 
3 
1 
5 

1 

2 
2 

33 
26 

13 

5 
2 
3 
1 

20 

6 
A 

5 

6 

26 

C 

l 

3C 

6 

49 

22 

7 

14 
1 

63 

15 
5 
9 
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ROUT I NE 

RULE 

SAMPLE 

SCANNING 

SCIENTIFIC 

SCOPE NOTE 

SEARCH CRITERIA 

SEARCH STRATEGY 

SEARCHING 

SECRECY 

SEE ALSO 

SEE-REF ERENCF 

SELECTION 

SELECTIVE DISSEM 

SEMANTIC 

SEQUENCE 

SERVICE 

SET THEORY 

SETS 

SHELFLIST 

SIGNIFICANCE 

SIMULATION 

SIZE 

SMALL 

SOCIAL IMPLIC# 

SOFTWARE 

SORTING 

SOURCE 

SPECIALIZED 

SPECIFICITY 

STANDARDIZATION 

STAT ASSOCIATION 

STAT# ANALYSIS 

STAT. METHOD 

STATE-OF-THE-ART 

STATISTICAL 

STORAGE 

STRING 

STRUCTURE 

SUBJECT 

SUBJECT HEADING 

SUBJECT INDEXING 

SUBJECT-CATALOG. 

SUBROUTINE 

SUMMARY 

SURVEY 

SYMBOL 

SYMBOLIC LOGIC 

SYMPOSIUM 

SYNONYM 

SYNTACTIC ANAL. 

SYNTAX 

SYSTEM 

TABLE 

TAG 

TECHNICAL 



r 



¥ 





TFCHNICAL REPORT 


I 




TFCHMOLOGY 


a 




telegraphic ars. 


3 


... 


TERMINOLOGY 


I 


f 

j 


TEST 


16 


i 


TFXT 


2C 




THEORY 


2C 


s 


THESAURUS 


30 


1 


TIME 


5 




T IMF-SHAPING 


6 


.! 


TITLE 


15 


jj 


TRANSFORMAT ION 


19 




TRANSLATION 


31 




TRANSLITERATION 


0 


P 

) 


TRANSMISSION 


3 




TRFE 


17 




TREE STRUCTURE 


<3 


i 


T RUNG AT ION 


3 


( 


TYPE STYLE 


C 




TYPF-SETT ING 


2 


r* 


TYPOGRAPHICAL 


0 


.1 


UNION CATALOG 


c 


% t* 


UNITERM SYSTEM 


5 




UPDAT ING 


1 


j 


USER 


32 


■<« V 


UTILITY 


A 




VALIDATION 


2 


j! 


VALUE 


12 


;i 

/j 


VARIABLE 


12 


* 


VFCTOR 


11 


... 


VENN DIAGRAM 


3 


jj 

h 


VISUAL 01 S. CON. 


0 




VOCABUL ARY 


18 




WEIGHT 


19 


* 

ii 


WEIGHT INDEXING 


c 


II 

ij ' 


WORD 


36 




WEIRD ASSOCIATION 


25 


jj 

ll 


WORD FREQUENCY 


6 



o 
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LABSRC 3: A DETAILED DESCRIPTION 



1. Introduction 



LABSRC 3 is one of three search programs designed to teach and 
demonstrate information retrieval techniques to students and faulty 
of the Library School. Of the three programs, LABSRC 3 is by far the 
most sophisticated because it allows true interaction between the 
user and the program and also permits the user to submit requests in 
the form of Boolean expressions. This appendix describes bMoRC i 
with particular emphasis on those features that distinguish it from 
the other two search programs . 



2 . General 



The program allows requests in the form of Boolean expressions 
that consist of valid index terms joined together with the usual log- 
ical connectives. Weights can he assigned hy the user to particular 
index terms and to parenthetic subexpressions , so that the rela ^e 
importance of different parts of the query can he indicated. . LABSRC 
3 also provides options so that the search can he performed in direct- 
match mode or associative-retrieval mode and so that relevance num- 
bers for the retrieved documents can he computed if the user so de- 
sires. When the options have been specified and the query suDmitted, 
LABSRC 3 searches the MASTER I file.* The program utilizes reasonably 
advanced techniques to minimize, search time. The Query, for instance, 
is analyzed hy a parser that puts out directly executable code m the 
forT^a subroutine that embodies the logic of the query. _ Since the 
logic is to he analyzed once for each document, this technique 1 
perior to interpretive methods. 



LABSRC 3 asks six questions during a normal pass through the pro 
gram so that the options and the query can he input hy the user. 

These questions are: 

Q01 Do you want word association? 

Q02 Specify association file 
Q03 Do you want scoring? 

QOU Enter Boolean expression 
Q05 Do you want results printed? 



QO 6 Specify restart or exit. 

The questions are self-explanatory and the answers given by the 
uaer are straightforward. However, at any time, instead of answering 
the question with a relevant answer, the user can input a command in 
the command language and essentially take over control to exp oi e 



#If the user so desires, the query may he expanded using terms 
drawn from any one of three files of term association data. 

W 121 - 



program fully. This is what makes LABSRC 3 truly interactive. Enter- 
ing commands is therefore quit? easy and natural since LABSRC' 3 fig- 
ures out whether the reply is an answer to the question or a command. 

3. Boolean Expressions and Request Formulation 

The syntax of Boolean Expressions accepted by LABSRC 3 as legal 
requests is given below in Backus-Naur Form. 

<Index Tem£> = any legal term that belongs to the thesaurus 

^Decimal no^ = any 4 digit decimal number n, where 04 n^. 9999 

<Primary> = ' <lndex Term> ' | ( <$oolean Express ion> ) 

^Primary Exp^ = <3?rimary> | NOT <3?rimarj^ 

<Secondarj^ = primary Exp > | decimal no> * ^rimar^ 

<AND-Exp.> = <Secondary> I <Secondar£> AND <AND-exp> 

<Boolean Expression> = <AND Expr> | <Boolean Expression OR <AND-Exp.> 

<Request> = <Boolean Express ion> 

Note that any index term or parenthetic sub-expression can be 
weighted down by multiplying it by a decimal number. 

Ex. without weights: 

(’Language* OR 'syntax') AND NOT 'grammar' 

Ex. with weights: 

0.5670* ('Language' OR 0.5000* 'syntax') AND NOT 'grammar' 

It is suggested that weights for individual index terms be as- 
signed through the ASSIGN command rather than explicitly typing them 
in. 



When Boolean expressions that are longer than one line are to 
be typed in, the last character in any incomplete line must be @. 
LABSRC 3 concatenates the lines together. The carriage is returned 
to indicate that an incomplete line has been input. 

4. Associative Retrieval and Scoring 

As pointed out earlier, LABSRC 3 is capable of searching the 
MASTERI file in either direct match or associative mode. In the 
former mode the user's request is used as submitted, while in the 
latter mode the request is extended to include more terms from the 
association file selected by the user. Each index term can be ex- 
tended to include up to a maximum of 4 associated terms. Some or 
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all of these terms can he later eliminated from the search by the 
user with the help of the command language. 

When scoring is specified, LABSRC 3 calculates relevance numbers 
that reflect the closeness between the request and the document. When 
direct match searching is asked for, scoring is obviously unnecessary 
but can be specified. 

Since association values reflect the degree of correlation be- 
tween any two terms , these values are used to obtain some measure 
of relevance for the documents found. 



When scoring is utilized, an. AND between any two terms or sub- 
expressions results in their values being multiplied. An OR results 
in the term or subexpression with the higher association value being 
chosen. 

The NOT in- LABSRC 3 is a unary operator and, when scoring is 
utilized, is treated as follows: 

NOT al where al is a term or a subexpression 



If the effective value of al i 0, it is made 0. If the effective 
value of al = 0, it is made .9 999 • 

When weights have been specified, the program uses simple 
multiplication to incorporate the effects of weights in the rele- 
vance number. 



The following example assumes that the user has requested scor- 
ing and word association using the KUHNS W file. Let us also assume 
that the input expression is: 

. 500* ( 1 LANGUAGE ’ AND 'GRAMMAR') AND NOT 'SYNTAX' 

From the KUHNS W file the input terms will be expanded as follows: 



Association 



Term 


Value 


Term 


Language 


.9999 . 


Grammar 


Objective 


.8381 


Parse 


Social Implic. 


.8381 . 


Syntactic. Anal 


Standardization 


.8381 


Fact Retrieval 


Related 


.8381 


Set Theory 



Association 

Value 



.9999 

.4469 

.4379 

.4085 

.4085 



Even though word association has been requested, 'Syntax' will not 
be expanded since it is a negated term. 

Now, during the search let us assume that we are looking at 
document A0103, which is indexed under the following terms: 

-123- 



Algorithm 

Clump 

Document 

Information 

Output 

Retrieval 

Survey 

Translation 



Analysis 

Cluster 

Documentation 

Keyword 

Processing 

Retrieval System 

Syntactic Anal. 



Automatic 

Computer 

Evaluation 

Language 

Prog . Language 

State-of-the-Art 

Thesaurus 



Classification 

Dictionary 

Indexing 

Natural Language 
Relevance 
Statistical 
Time Sharing 



Since 'Language* occurs in the document, it will be represented by a 
value of .999 • Although 'grammar' does not occur, 'Syntactic anal.' 
does. 'Grammar' will be represented by a value of .4379. 'Syntax' 
does not occur. Therefore, A0103 satisfies the input expression and 
its relevance value is .500 x (.999 x .437) x .999 which works out to 
.216. 



If weights had been assigned to individual terms , these weights 
would have been multiplied into the effective values for each term 
before the evaluation of the expression. 

5. The Command Language 

Design criteria . The command language was designed with three 
main objectives in mind. First, the language should be easy to use 
and should be equally amenable to novice and sophistocated users. 
Second, it should allow the user to interact with the program effec- 
tively and should familiarize the user with all aspects of LABSRC 3. 
Lastly, the language should be easy to implement. 

The first objective was met by allowing commands to be input in 
pseudo-natural language. A text analyser was written that analyses 
the given command and transforms it into an internal canonical form 
containing the relevant parts of the command. The second was met by 
permitting a large variety of commands. The last objective was met 
by specifying that the commands be in the form of a verb and a pred- 
icate. 

Forms of commands . As indicated above a command consists of a 
verb and a predicate. The verb can be one of the following: 



1) 


Display 


8) 


Replace str. wt. 


2) 


Count 


9) 


Execute 


3) 


Modify 


10) 


Initialize 


4) 


Search 


11) 


Assign 


5) 


Proceed 


12) 


Go to 


6) 


Replace 


13) 


Sorta 


7) 


Replace op. wt. 


i4) 


Sortd 






15) 


Exit 
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The predicate consists of any sentence in natural language containing 
keywords or a cryptic form consisting of one of the same keywords 
(with the exception of the predicate for EXECUTE). 

For example, the DISPLAY and COUNT commands result in output to 
the terminal. Since mechanical terminals and CRT terminals over a 
slow speed co mmu nications line are fairly slow, the COUNT command can 
he used to find out how much information is going to he output. The 
DISPLAY command can then he used to output the data or parts of the 
data; for example: 

COUNT the number of documents found 

DISPLAY all documents with a relevance value *GT* .1234 

DISPLAY all the most highly associated terms 

DISPLAY the terms associated with ’ grammar * 

These commands, as shown, are perfectly valid. As can he seen, the 
predicate can he in natural language . The keywords have teen under- 
lined. 

One should note that keywords and numbers in the predicate result 
in the outputting of parts of lists in memory. A sentence without key- 
words will generally output the whole list. Therefore, a naive user 
will generally get more than he wants. For example, the command DIS- 
PLAY association data (no keywords) will result in the whole associa- 
tion table being displayed. The COMMAND analyzer checks for the order 
of the keywords, among other things, and issues diagnostic messages 
when there is ambiguity. 

It is also possible for the user to enter his request in a cryp- 
tic form containing just the keywords, for brevity. The analyzer con- 
verts the forms below into a 24 byte ’instruction’ which is then in- 
terpreted by various routines. An advanced programmer may prefer an 
assembler-like language. This is done by using the EXECUTE commands: 

DISPLAY all terms associated with ’grammar’ with an 
association value *LT*.5678 

DISPLAY ’grammar’ *LT*.5678 

EXECUTE D4 grammar , , . 5678 

The above three commands are equivalent. The predicate of the EXECUTE 
consists of the subfields of the internal canonical form produced by 
the analyzer. 

Use of Commands . The quest ^ns asked by LABSRC 3 during a normal 
pass* are identified by numbers Q01-Q06 as shown in Section 2. 

a) Replies to Q01, Q02, Q03, Q05, QO 6 can be commands. 



*A pass that has not been interrupted with a command. 
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t>) A reply to Q04 must not be a. command. 

c) Once a command has been entered, the normal predefined pro- 
gram flow is no longer in operation and any number of com- 
mands may be given. 

d) It is suggested that during the first pass through the pro- 
gram or after a "restart" reply to Q06, no commands are 
typed in as answers to Q01-Q04 (other than a forward default 
branch - explained later). 

6. The Commands 

This section describes the actual commands in detail. They are 
described according to their functions as shown below. 

6.1 Branching Commands 

GO TO Q — or GO TO Q — S cause the program, to branch to the 
question number indicated, i.e. normal program flow is interrupted. 
There are two kinds of branches - forward and backward. If the GO 
TO Q — (S) command refers to a question number less than or equal to 
the present question number where the command has been typed in, 
this branch is referred to as a backward branch. Otherwise it is a 
forward branch. 

Backward branches do not require any further explanation. 

If the branch is a forward branch , a GO TO Q — causes a branch 
using default answers (see table below) for all intermediate questions 
skipped over. A GO TO Q — S forward branch causes a branch using the 
most current answers (specified by the user) for all intermediate 
questions skipped over. 



The PROCEED command returns control to normal program flow. If 
at any time flow has been interrupted by commands, PROCEED causes 
the program to branch to the next question. 

EXIT causes the program to exit and control is returned to. the 
Terminal Monitor System. 



Question No. 



Default Options 
yes 
DOYLE 
yes 



Q01 

Q02 

QOS 

Q04 

Q05 

Q06 



previous Boolean expression 
no 
exit 



6.2 Commands that Assign and Edit Weights for Boolean Expressions 



a) Weights may "be assigned explicitly in the Boolean expression 
with the * operator. These weights are referred to below as string 
weights. String weights, of course, may be weights for individual 
index terms or for parenthetic subexpressions , i.e.,* must be fol- 
lowed by an index term or a left parenthesis. 

b) Weights may be assigned to individual operands (index terms) 

in the Boolean expression at any time by the ASSIGN . to ’ ’ 

command. These weights are called operand weights. 

c) String weights may be changed with the REPLACE STR. WT. 

by • command. All string weights equal to the first 

weight in the command will be replaced by the second. 

d) Operand weights may be changed similar to c) by the REPLACE 

OP. WT. • by • command. 

e) A REPLACE • by command results in both string 

and operand weights being changed. 

Notes to a-e above: 

a-e: The weights must be k digit decimal numbers between 0 and 1 

When editing has been done, remember that the weights as- 
a “ e * signed are the most current set of weights. 

Diagnostics are provided if the- weights in a command are 
a-e: greater than 1 or if the index term in the ASSIGN command 

does not exist. 

An index term may be eliminated from the search by assign- 

ing a weight of .0000. 

Note that weights for index terms can be string weights or 
a ”^* can be entered as operand weights with the ASSIGN command. 

If the first weight in any replace command does not exist, 
c “ e: no modification is done. No diagnostic is provided. 

6.3 Commands that Edit Association Data 

This is primarily done by the SEARCH and/or MODIFY commands. If 
n terms exist in the Boolean expression one can visualize the associa- 
tion data as a table of n rows and 5 columns. Each row corresponds to 
the original term and its four associated terms along with their assoc 
iation values. The above commands permit the user to selectively elim 
inate terms from the search. (Terms can only be eliminated. A term 
previously eliminated cannot be reinstated unless an INITIALIZE is exe 
cuted. This command has no predicate.) 
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The commands allow the user to operate on one selected row at 
a time or on all rows simultaneously. The SEARCH and MODIFY com- 
mands can be classified into two types - those that eliminate a 
certain number of columns or those that allow terms to be elimi- 
nated on the basis that these association values are greater than, 
equal to, or less than some specified threshold. The difference 
between SEARCH and MODIFY commands is that while SEARCH automati- 
cally initiates a search after the prescribed modification, MODIFY 
does not initiate a search. 

Exs: SEARCH using only the most highly associated terms 

SEARCH using terms #EQ*.9999 



General Notes: 



a) Ex: MODIFY to use association to a depth of 2 terms 

implies that only the two most highly associated terms should be 
used. 



b) An INITIALIZE restores the table to the state correspond- 
ing to the most current answers to Q01 and Q04. 

c) A change in the answer to Q02 is not reflected in the table 
until a search has been made. 

d) Note that the original terms themselves can be eliminated 
from the search with the SEARCH or MODIFY commands . 

Ex: SEARCH using terms *LT*.9999 

6.k Commands Relating to the Display of Documents 

a) SORTD and SORTA sort the documents in descending or ascend- 
ing order or relevance respectively. These make sense, of course, 

only when scoring has been asked for. These commands have no pred- 
icate. 

b) The COUNT commands pertaining to documents count the number 
of documents (relative to a threshold if specified). 

Ex: COUNT documents with relevance values #GT*.5000 

c) The DISPLAY commands pertaining to documents display a 
specified number of documents (relative to a threshold, if specified). 

Ex: DISPLAY 7 documents 



o 
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Notes to a-c above; 



With the combination of the SORT ( A/D ) and DISPLAY commands, 
a,c: selected portions of the list of documents found can he 

output . 

The commands allow a certain amount of guessing. Ex : As- 

sume DISPLAY 8 documents *GT*.5000 is input. If only 5 
c: documents have relevance values greater than .5000, only 
5 will he output. If 15 such documents exist, only 8 
will be output. 

6.5 Commands Pertaining to the Display of Association Data 

a) The COUNT commands pertaining to terms allow the user to 
count the number of terms relative to a specified association value. 

Ex: COUNT terms *EQ*.8000 

b) The DISPLAY commands pertaining to terms allow the user to 
display parts of the association table. Here a selected row or all 
rows can be displayed to any depth (maximum of ^-) . 

Ex: DISPLAY the most highly associated terms. 

Notes to a-b above : 

b: DISPLAYing terms relative to a threshold is not permitted. 

#’s are printed after terms that are not to be included in 
the search. These, of course, are terms eliminated by 
13 : SEARCH/MODIFY commands and/or terms eliminated due to their 

appearance in negated substrings in the Boolean Expression. 

Table 1 indicates the forms of legal commands. 



TABLE 1: FORMS OF LEGAL COMMANDS 



JZEBB. 

GO TO 
GO TO 
PROCEED 
EXIT 
ASSIGN 

REPLACE OP. WT. 

REPLACE STR. WT. 
REPLACE 

SEARCH 

SEARCH/MODIFY 

SEARCH/MODIFY 

SEARCH/MODIFY 

SEARCH/MODIFY 

SEARCH/MODIFY 

SEARCH/MODIFY 

SEARCH/MODIFY 

SEARCH/MODIFY 

INITIAL! SE/lNITIALEZE 

SORTD 

SORTA 

COUNT 



COMMAND 

PREDICATE KEYWORDS 

0 , 0 - 
00— s 

no keywords 
no keywords 

• , ’index term' 

* mm mm mm tm • mm mm mm mm 




no keywords 
no 

no, 'index term' 
x^, 'index term' 

x i 

'index term', **x*, • 

a . WnHM 
2 9 

most, 'index term' 
most 

no keywords 
no keywords 
no keywords 
documents 
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ifflL 

- = 1,2, 3, 4, 5, or 6 



index term = any 
operand in Boolean 
expression 

— — = 4 digit 

decimal no. 



(Note: here "no" 
means no association 
as in SEARCH USING 
NO ASSOC. ) 



x 1 s 1,2,3, or 4 



x 2 a GT, LT, or EQ 



TABLE 1 (Continued) 



COMMAND 



VERB 


PREDICATE KEYWORDS 


LEGEND 


COUNT 


documents, 




DISPLAY 


Xy documents 


x^ is optional - 

may be absent or 
be a number 


DISPLAY 


Xy documents, *x^*, 


x^ is optional 


COUNT 


’index term’, •— — 




COUNT 


■)fy • mm mm mm mm 




DISPLAY 


no keywords 




DISPLAY 


’index term’ 




DISPLAY 


x^, ’index term’ 




DISPLAY 


x i 




DISPLAY 


most 




DISPLAY 


most, 'index term’ 
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UTILITY PROGRAMS 



MASTER I Generator . Generates from cards an indexed file 
and corresponding printed output. The file is indexed on document 
accession number and each entry contains the index terms for that 

document . 



INVERT I Generator. Generates from the MASTER I file an 
inverted file (with respect to index terms) and corresponding 
printed output. INVERT I is a sequential file and each entry 
contains the accession numbers of documents assigned a particular 
index term and the total number of documents using that term. 



CO-OCCURRENCE Generator . Generates from INVERT I a sequential 
file and printed output . Each record corresponds to an index term 
and is one row of the co-occurrence matrix (i.e., the matrix whose 
elements are the number of times a particular pair of index terms 
co-occur in the MASTER I file). The printout from this program 
permits one to readily note the number of co-occurrences of any 
pair of terms in the file. 



ASSOCIATION COEFFICIENT Generator . Generates an indexed file 
(indexed on index terms) andprinted output. Each entry contains 
the four index terms most highly associated with the header term 
together with their coefficients of association. The program is 
set up so that different associative measures may he generated by 
using different subroutines . 



MASTER B Generator . Generates a sequential file and produces 
an equivalent printout. Each entry, defined by a document acces- 
sion number, contains the bibliographic information (author, title, 
publisher-, etc.) for that document. 



INVERT B Generator . Inverts the MASTER B file with respect 
to author and generates a printed output • Documents written by^ 
more than one author are listed under each author and "see also" 
notes refer to the other author(s). 
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DOCUMENT ATTRIBUTE CODES 



(Reprinted from An Experimental inquiry into xtuuxma^^,, — ..b -• 

Laura Gould et al, Institute of Library Research, Berkeley , January 19o9» PP* 
93-95. Funded hy National Science Foundation Grant No. GN643.) 


A. -MAJOR CODES 




P/-\ r? 




Example 


AD 


Author-assigned descriptor 


110 


AU 


Author (personal) of corpus or cited doc. 


BAR-HILLEL, Y. 


BS 


Book series plus series number 


NBS MISC PUBL*N0e269 


CA 


Corporate author 


RAND 


ED 


Editor of a hook 


LIVINGSTON, H.H. 


ID 


Indexer-assigned descriptor 


3 


JO 


Journal name plus issue number 


JACM*V0L . 1 ,N0 . 2 


LD 


Lab descriptor 


279 


PU 


Publisher 


SPARTAN 


RA 


Review author 


OPLER, A. 


RD 


Reviewer-assigned descriptor 


7 


RJ 


Review journal name plus issue number 


COMPUT REV*V0L.6,N0.5 


RS 


Report series plus series number 


ASTIA AD SERIES*AD N0.231606 


XC 


Computing Reviews index tag 


3.7 


XE 


IEEE index tag 


PROGRAMMING 


XM 


Math Reviews inaex tag 


1 


YR 


Year of publication 


65 
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A. MAJOR 

Code 

BI 

Cl 

CM 

RE 

3. MINOR 

Code 

CO 

P? 

PR 



t 



CODES (cont.) 



s 



Meaning Example 

Bibliography B2 BI A1 



Citation B2 Cl A1 



Comment 



B2 CM A1 



Reference 



B2 RE A1 



Explanation 



Indicates that B2 is part 
of the document file be- 
cause it was found in a 
bibliographic list at the 
end of A1 

Indicates that B2 is part 
of the document file be- 
cause its content was 
specifically discussed 
(i.e. cited) in the body 
of A1 

Indicates that the content 
of B2 constitutes a 
comment upon or an answer 
to A1 

Indicates that B2 is part 
of the document file be- 
cause it was mentioned 
without specific dis- 
cussion of its content, 
in the body of A1 



\J u 



ti it 



(» 



CODES 



Meaning 



Example (s ) 



Explanation 



Collation 



TH. ,JUDJE 
PR. 

FALL 



Contains notation of 
thesis (TH.), pre- 
print (PR.), months 
and other notes 



Pagination 



Presented at 



176-9 



ANNUAL MEETING 
OF THE AMERICAN 
SOCIETY FOR 
ENGINEERING 



Page numbers of docu- 
ment represented with 
a minimum number of 
digits 

Name of congress , meet- 
ing, etc. where document 
was presented, ususally 
including date and place 
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B. MINOR CODES (cont.) 



Meaning Example Explanati on 



DU 


Duplicate 


B2 DU A1 


Indicates that B2 is a 
duplicate of Al, in the 
sense that it is identical 
to it, hut published 
elsewhere 


RL 


Related 


B2 RL A1 


Indicates that B2 and Al 
are related in the sense 
that they are parts of a 
sequence of documents 
(e.g. PART 1, PART 2, or 
VOL. 1, VOL. 2) 


VE 


Version 


B2 VE A1 


Indicates that B2 constitutes 
another version of Al, which 
is modified, expanded, 
etc., hut not identical 
with it (see DU above) 


C. 


TITLE CODES 






& 


Meaning 


Example 


Explanation 


uUUC 

BO 


Book 


ICSI PROC 1958 


Title of a hook 


JT 


Journal title 
(theme) 


PARAMETERS OP 
INFORMATION 


Title or theme of a 
Journal issue 


RP 


Report 


automatic 

INDEXING 


Title of a report 
hound alone 


BA 


Book article 


THE BASIC TYPES 
OP INFORMATION 


Chapter of a hook 


JA 


Journal article 


SYNTACTIC STRUC- Title of a Journal 
TURE AND AMBIGUITY article 
IN ENGLISH 


RR 


Report 


DOD USER STUDY, 
PHASE I 


Individual report 
hound within a volume 
containing several 



reports 
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PROGRAM - STUDENT INTERACTION: TYPICAL SEQUENCE 



i'HERE IS A CODE IN EFFECT FOR TRACINGS* 

THOSE NUMBERED WITH ARABIC NUMERALS ARE FOR SUBJECT ADDED ENTRIES* 

THOSE NUMBERED WITH ROMAN NUMERALS ARE FOR OTHER ELEMENTS, SUCH AS 
TITLES. 

FROM WHAT I H'AV.T JUST SAID, WHAT MIGHT THIS BE? 

I. ROME— HISTORY— REPUBLIC, 2S3-30 B.C. 

^A MOTHER ELEMENT IPeflftCfo SrUD€NT IAIPUT 

TRY’AGAIN* THE COMPLETE TERM. 

.SUBJECT ENTRY 

WHAT KIND OF ENTRY AND WHAT KIND OF DEVICE? TRY AGAIN. 

,S©JECT ADDED ENTRY 

A SUBJECT ADDED ENTRY TRACING. YOU COULD TELL BY THE ARABIC NUMERAL 
EVEN IF IT WEREN’T CLEAR FROM THE HEADING ITSELF. 



I WANT V«l| jn try IDENTIFYING THE TYPES OF TRACING LISTED BELOW, FROM 
EXAMPLES WHICH I WILL GIVE YOU IN A MOMENT 



TITLE ADDED ENTRY 
JOINT AUTHOR ADDED ENTRY 
SERIES ADDED ENTRY 
SUBJECT ADDED ENTRY 
f TRANSLATOR ADDED ENTRY 



READY? TAKE YOUR TIME. WHENEVER YOU WANT TO LOOK AT THE LIST, JUST 
TYPE THE WORDS * REPEAT LIST.” 



SUPPOSE WE HAVE A BOOK ENTITLED " THE REPUBLIC OF ROME," BY 
,101 NT AUTHORS RENE L. PAGE AND JEAN LAVAL. THE FORMER HAS BEEN 
SELECTED AS THE PRINCIPAL AUTHOR FOR PURPOSES OF MAIN ENTRY. THE BOOK 
JS A TRANSLATION FROM THE FRENCH BY A. B. SEA. IT IS PART OF THE 
TEMPS PERDIJS SERIES’* • 



WHAT IS THIS TRACING: I. ROME— HISTORY— REPUBLIC, 2S5-30 B.C. 

.SUBJECT 

TRY AGAIN. THREE WORDS. 

.SUBJECT ENTRY TRACING 

THAT WAS A SUBJECT ADDED ENTRY. 

HOW ABOUT: 2. ROME— HISTORY— CIVIL WAR, 49-48 B*C.? 



.SAME AS ABOVE 
GOOD. 

HOW ABOUT: 3. ROME— OFFICIALS AND EMPLOYEES 

.DITTO 

FINE. NOW WHAT IS THIS? II. LAVAL, JEAN, 1930- 



.JOI 

+++ TYPE AGAIN +++ 
.AUTHOR ENTRY 






t JT* AUTH. 



TRY AGAI«. ALL FOUR WORDS, PLEASE. 

.JOINT AUTHOR MAIN ENTRY 

THAT ONE WAS A JOINT AUTHOR ADDED ENTRY. 
TRY ’ I. SEA, A. B., 1940- , TR. 

,l,KT ME CHECK THE LIST AGAIN 



TITLE ADDED ENTRY 
JOINT AUTHOR ADDED ENTRY 
SERIES ADDED ENTRY 
SUBJECT ADDED ENTRY 
TRANSLATOR ADDED ENTRY 

.OK, A.B. SEA LOOKS LIKE A TRANSLATOR ADDED ENTRY 

YOU’RE DOING BEAUTIFULLY. HERE’S ANOTHER: IV. THE REPUBLIC OF 

ROYF 

.TITLE ADDED 
+++ TYPE AGAIN +++ 

.SUBJECT ADDED ENTRY 
TRY AGAIN. THIS ONE IS SO-O-O-O F.ASYJ 
.TITLE fnjTry? 

TITLE ADDED . ENTRY. MON FELIX SUM I • 

'•'ELL, MAYBE YOU CAN REDEEM YOURSELF WITH THIS FINAL ONE* 

II I. TEMPS PERDIJS SERIES. 

.SERIES ADDED ENTRY 

VERY OOOD INDEED. THAT COVERS MOST OF THE TYPES OF ADDED ENTRY YOU 
ARE LIKELY TO ENCOUNTER, WITH THE EXCEPTION OF ILLUSTRATOR ADDED 
ENTRIES, WHICH ARE FAIRLY RARE BECAUSE THEY ARE USED ONLY WHEN THE 
IDENTITY OF THE ILLUSTRATOR IS OF ESPECIAL INTEREST. 



BASIC SOURCE PROGRAM FOR ABOVE 



KANCINSKY: ,_T , f 

/ ? * 

* _T : THERF IS A COCE IN EFFECT FOR TRACINGS: 

/ ' ' — — 

/ THOSE NUMBEREO WI TH ARABIC NUMERALS ARE FOR SUBJECT ADDED ENTRIES. 

“ ~ “ f — ’ ~ ‘ — — — — 

/ THOSE NUMBERED WITH ROM_AN NUMERALS ARE FOR OTHER ELEMENTS, SUCH AS 

T ITL ES. S “ 3 “ 

T s_ 



T i FROM WHAT I HAVE JUST SAID. WHAT MIGHT THIS BE? 
/ 1» ROME *— -HI STCRV—REPUGLI C , ?6S-S(; B.C.t 
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1 



MECHIKKUS 



R : DEBUG? 

c : G J UM P TO author; . 



R : SUBJECT ADDED ENTRY, SUBJECT ADDED E NTRY, TRACING SUBJECT, SUBJECT 

, TRACING. TRACING SUBJECT, SUBJECT TRACING? 

G : A SUBJECT ADDED GNTPY TRACING* YO U COULD TELL B Y THE . . ARABIC NUMERAL , 



^eO’F IT WEREN'T CLEAR FROM THE HEADING ITSELF.? 
C :GJUMP TO Z Z538? 



R ; T RACIN G, TRACINGj. 



G : Y~E S , BUT FOR VIHAT KIND CF FNTPY? ? 
C S.G JUMP ..TO _M ECHIKKU ? 



P : ENTRY, EN TRY? 

— G : WHAT _ KIND OF ENTRY AND WHAT KIND OF DEVICE? 
C :GJUMP TC _HFC_HIKKUJ._ I 



TRY AGAIN.? 



R 



G 

T 

C 



"f NOV ~I T C CU LDN • T POSSIBLY BE A TRACING FOR A TITLE ADDED ENTRY. 

T! IE ARAB I C NU PER A L_ 7 ELL SJTpU. J.H T_R Y__A G A IN._? 

SGMARK STALLY , 

G JUMP TO M ECHIKKU? , . * 

JUSE <Z5 35, Z 536, Z537) ON (ZAP) , 

TALLY ZAP, . 

MARK (A..NILNUM, A.’.NILNUM, 



ON 

ON 



(ZAP), 

(ZAP)? 



Z C .B5S 

Z5B6? 

Z537? 



ZZ538: 



-f- 



T 

T 

T 

T 

N 



A.’.NILNUM* STALLY) 

JUMP fn~ rA~.NTI_1.OC . MECHIKK U, M ECHIKKU, 7Z 538_)_ 

: TRY AGAIN. THE COMPLETE TERM.? _„ I1AW 

! DID YOU FORGET "ADDED” EN TR Y? TRY AGAIN,. _A NY WAY ..5. _ . 

: THE ANSWER' IS " TRACING OF (CR FOR) A SUBJECT ADDED ENTRY," OR SIMPLY 

«_ S U B J E C T„AO RED E NT- R V TP AGING." 8 

SHARK ZAP? “ 

: ZZ538? — 



BE.NET : 



C -.Tf STALLY 

THEN _ADD ? TO LVCLLY_ ,_ 
CLEAR' STALLY"? 



TRACINGS? 



t * ? I WANT YOU TO T RY^T.DE^TJ^XL^-J^i — BEL ° Wt F — 
^S^HKH- WILL GIVE YOU IN A MOMENT? 



i 

/ 



"Title added ent ry 

JOINT AUTHOR ADDED ENTRY. 



/ 

/ 



SERIES ADCED ENTRY 
SUBJECT ACDEC ENTRY. 



TRANSLATOR ADDED ENTRY?- 



/ REACV? TSKE YOUR TW. WHENEVER VQU HftNT TU LOOK «T THE USTT 
TYPE THE WORDS " REPEAT LIST."?_ “ 



jr__sup_ pnsE WE HAVE J/ H lavat! UP the FORMhR M HAS bTeFT 

tm w f 

TEMPS PERDUS SERIES".? 






T? WHAT is THIS TRACING? 



RCKE--HT STORY — REPUBLIC, 265-30 B.C. ? 



BENEZETi 



• * 



R : DEBUG? 



C :GJUMP TO AUTHOR? 



R 

G 



g5f T HoS°Ig^!: SUB '- CT WAR .9-48 B.Cgi. 



c :GJUMP TO ZZ540? 
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o 

ERIC 



1 


R 

r 


:LIST ; 

• TRACINGS, 










r 


G JUMP TO RENCZET5 
• IICC 1 7R5Q. ON (7AP), 










, C 


• Uofc | LJJ71 9 T 

TALLY ZAP, 

tm l v I4..NI LNUM , A..MLNUM* IVGLLY) 


ON 


(ZAP) 




7 C • 


T 


-ii imp TO CA^.NILLDC, BENEZET, ZZ540) 
• TRY AG AT N* THREE WORDS.; 


ON 


(zap); 




Z54f : 


i 

. T 


: THAT WAS A SUBJECT AODE 
/ 


D 


ENT 


R Y. 




r 


/ HOW ABOUT: 2. ROME HIo 

• MARK 7 A P 2 


TORY— 


CIVIL WAR, 4-9-4H 


Lf •>*+■ » • 


V 

N 


”*: ZZ54G; 









PFNGUPION: a 

r : DEB UG; 



C S6JUKP TO" aO'TKOR';' 

ADf)FD~tfNTKYrCSMFt SAMP* LIKHWISFVl IKFW ISE, DITTO, DITTn, t" J 
G : GOOD. ’ — 

/ HOW ABOUT I 3 . ROME — OFFICI A LS AND FMPLQV.F.ES , . _ - 

TTwuHTWTzMr* 



R :LI ST; 

C iGUSF TRAC I MGS » 



GJUMPTG . BENGURION? 
r. :USF (Z541, Z542) ON (ZAP), 





TALLY ZAP, 

T All Y (A..N1 LNUM , A..NILNUM, 


LVCLLY) ON (ZAP), 


7 C /. 7 • 


JUMP TO ( A.. NTLLOC. BF.NGURION, 7.Z542) ON (ZAP), 

T • TRY AGAIN* THPFF: W.CRn$«» — 


L 1 • 
Z342: 


T : THAT WAS ALSO A SUBJECT ADDED ENTRY. pmpi OYFFS* 

/ T P Y THIS ONES 3* RQME^-OFF Id AL S AND EMPLOYEES* 


Z7.542: 


C :MARK ZAP; 

fc • 7754?: 


— . 




CENCITS 


A : ; 


• 


R : DEBUG; 

r • r.u IMP TO AUTHOR; 




„ ,crr Aonen PWTRY.SAME. SAME, 1 IKFWIS F.L IK FWISF. 01 tTO, DITTO,*”; 




r • juuji;u i . . *• . 


II* LAVAL, JEAN, 1930- , JT. 




G : FINE* MOW WHAT 15 IH157 
AIITH-; 




C sGJUMP TO ZZ544; 






_ . . - * « - — — — — — 







p :LIST; 

r :GU^p TRACINGS, 






GJUMP to bencit; 
f :USF (7543, 7 544) ' ON ( 7 AP ) , 


, 





TALLY ZAP, 

TALLY ( A • • NIL NUM * A..NILNUM* 


IVOILY) ON (ZAP), 




Z 7.544) ON (ZAP); • 
. PLEASE.; 




JUMP TO ibrf tllLLCLf BFNH IT f 
. t • try AftATN. ALL THREE WORDS 


7.543: 




ADDED ENTRY. 


7544: 


T : THATt AGAIN* WAS. A SUBJECT 
/ 




/ WHAT ABOUT 
r fMA9K 7AP: 


II* LAVAL, JEAN, 1930- , JT. AUTH.; 


.y.^T 

N : ZZ544? 


nr Mcrw • 


AS* 




r stfrug: 




C :GJUMP TO AUTHOP; 






— ’ K * iriNT AUTHOR ADCFC ENTRY, JOINT AUTHOR ADDED ENTRY; 

* • YES. NOW WHAT IS THTS? I* SEA, A_. B. , 1940- — * TR.j 




O 

ERLC 



c :GJUMP TO ZZ546; 



R 

C 



:LIST ; 

:GUSE TRACINGS, 

GJUMP TO bhnson; 



7 54 5: 
154b'- 



ZZ546J 



7.546) ON (Z AP ), 



TO 

c ' :USE _( Z545, _ 

TALLY ZAP , 

TALLY ( A..NILNUM,, A..NILNUM,_ 



JUMP 

TRY 



TO ( Amm MLLCC , BENSON, 
AGAIN. ALL .FOUR WOROS, 



T : THAT ONE WAS A JOINT AUTH 



LVOLLY) ON (ZAP).,,, 
ZZ546) ON (ZAP); 
PL EASE.; 

0 



ADDEO ENTRY. 



/ TRY 
:MARK 



I. SEA, A. B. f 1940- 



TR. 



ZAP? 



N :ZZ546; 



p. F NTL INK * A 



R rDEBUGj 

C : GJUMP TO AUTHOR; 



~TT ~- TRANSLATOR JOCEC ENTRY, TRANSLATOR ADOEO ENTRY; 

G : YOU'RE DO ING BEAUTIFULLY. HERE 'S ANOTHERj IV 

' ROME? 

IJ^„ISJ^MP._I0 U ZZ548.; ; 



THE 



RFPUBLIC OF 



R ‘.LIST; 

C :GUSE TRACINGS, 

GJUMP _T0 BFNTLINK; 





R 


; ADDED ENTRY , ADDED ..fcN 1 


WHAT DOES THF "TR." MEAN? 


PLEASE REPEAT. 




G 


; WHAT KIND OF ADDEC ENTRY? 

ithrpp words.); 


• 


C 

r 


: GJUMP TO BENTLINK; 

•IIRF (7547. Z548 ) ON (ZAP), 










tally ZAP, 

tai i v ( A-«NTI NtJM, A..NTLNUM, 


LVOLLY) ON (ZAP), 






? l 

* T 


JUMP TO ( A.. NILLQC, BENTLINK 

• T DV Af:,ATN. • 


, ZZ548 ) ON (ZAP); 




75A7i ..... 
Z5A8: 


j T 


: YOU SHOULD HAVE ANSWERED " 
/ 


TRANSLATOR ADDED ENTRY." 






! c’ 


f TRY ANOTHER: 
•MARK 7APS 


IV. THE REPUBLIC OF 


ROME. 1 , 


Z Z5A8 * 


-i-t- 

1 N 


; ZZ548; 







PENTLEY :, A 

D 


• • 

• . 7 , - “ “ 

iDFRUG: 


* 


rs 

"c 

1 


; GJUMP TO AUTHOR; 




R 

r. 


: TITLE ADDED ENTRY, TITLE 
• kpfp TT IIP ONE MOPE: 


ADDED ENTRY ; 

III. TEMPS PERDUS SERIES.; 


••••' iv. • i. . , • ct: r> « .T * * ^ ■» 



-lU9- 



r 



/ 



- 


* -- , 








C 


:GJUMP TO ZZ550; 






R 

c 


:L1ST ; 

:Gl)SE TRACINGS, 






* 

c 


GJUMP TO BENTLEY; 

:IJSF ( 7.549 . Z550) ON (ZAP), 


A. 


TALLY ZAP, 

TALLY ( A. • NILNUM , A..NILNUM, LVOLLY) ON (ZAP), 


75 AG: 


T 


JUMP TO ( A.. MLLCC. BENT L FY , Z7550) ON (ZAP); 
: TRY AGAIN. THIS ONE IS SP-C-U-0 EASY ; 




7 5 s r. s 


7 


: T I T L F ADDFD ENTRY. NON FELIX SUM 

/ 


, 


/ WELL, MAYBE YCU CAN REDEFM YOURSELF WITH THIS FINAL ONE: 

III. TEMPS PFRDUS SFRIFSo'S _ 


Z 7 5 50 : 


C 

N 


:MARK ZAP; 
: Z7.550; 


* 


• 


RFNTON : 


A 


m • 

• f 






R 

C 


: DEBUG; 

:GJUMP TO AUTHOR; 





R : SERIES AH DEC ENTRY, SFPJES ADDED ENTRY; 

GTVERY'GOOb I NOE F'C. THAT COVERS MOST OF THE TYPES OF ADDED ENTRY YOU 
ARE L I K E L Y_T C F. N COUNT F. R , WITH TH E FXCFPTION O F ILLUSTRATOR AODEP 
FNfRIFS, WHICH” ARE FAIRLY RARE BECAUSE THEY ARE USED ONLY WHEN THE 

IDENTITY OF THE I LLUSTRATOR IS OF ESPECIAL INTEREST.; 

C : G J ij MP T 0 Z Z 5 53 ; 



R -.LIST; 

C :GUSF TRACINGS,. __ : 

GJUMP TO BENTON; • 

C :IJSF (Z5 51, 7 55 P i ON ( 7 A P ) , . ... 

TALLY ZAP, 

TALLY (A..NILNUM, A. .NT LNUM, LVOLLY) ON ( ZAP), 

JUMP" TO ( NILI- CC« DENTON, Z2553) ON (ZAP); 

7551: T : TRY AGAIN.; _ 

755?: "Y : THAT, "NAME", WAS A SERIFS ACCEO ENTRY.; 

Z255B : 1C : MARK ZAPj _ 

N : ZZ553; 

P EPF N GARIA: T : 

/ ; 

T WHAT IF YOU FIND NO TRACINGS A T THE BO TTOM OF A CARD? 

/ IT MEANS THAT THERE ARE ( VFRY F EW) (3 ) (NO) (INDESCRIBABLE) 
' (NGN-DISTINCT IV E) ADDED ENTRIES FOR THAT PARTICULAR BOOK.; 



* R : M NEGAT W ; 

G : CORRECT. IT HAPPENS.; 
C :G J UMP TO 11 5 55 ; 



ERLC 
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